allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.58k stars 645 forks source link

Segmentation Fault after 2 line integration #752

Open PaulZhangIsing opened 2 years ago

PaulZhangIsing commented 2 years ago

Thank you for helping us making ClearML better!

Describe the bug

After adding clearml and clearml.task, I run the code as per-normal, and it happens to be like image

I am using local clearml serve It does not happen this problem when I first install and run it It only happens after I have re-installed it, as first time I tried to add huge dataset to my local server and there is no disk space. I have to remove everything thre

To reproduce

My clearml server is run locally. after setting up server, do 2 line integration

Expected behaviour

What is the expected behaviour? What should've happened but didn't?

Environment

jkhenning commented 2 years ago

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

PaulZhangIsing commented 2 years ago

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

Sure. The attached is the code:

`import json import numpy as np import os import tensorflow as tf from tensorflow import keras as keras import tensorflow.keras.initializers as tf_initializers import shutil

from clearml import Task

task = Task.init(project_name="PAS_ARADO", task_name="trial1")

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

def load_data(): fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() train_images = train_images / 255.0 test_images = test_images / 255.0 return train_images, train_labels, test_images, test_labels

def make_model(args): model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ])

model.compile(tf.keras.optimizers.Adam(learning_rate=args['learning_rate']),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

return model

def train(args, model, train_x, train_y):

model.fit(train_x, train_y, epochs=args['epochs'], batch_size=args['batch_size'])
return model

def evaluate(args, model, test_x, test_y): score = model.evaluate(x=test_x, y=test_y, batch_size=args['test_batch_size']) return score

def run_and_log(json_filename='json_data.json'): with open(json_filename) as json_file: args = json.load(json_file)

print(args)

tf.compat.v1.random.set_random_seed(args['seed'])

train_x, train_y, test_x, test_y = load_data()

model = make_model(args)

trained_model = train(args, model, train_x, train_y)
tf.saved_model.save(trained_model, "./tmp/saved_model.h5")
results = evaluate(args, trained_model, test_x, test_y)

print("\nTest set accuracy: %   .3f\n" % results[1])

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
TFLITE_FILE_PATH = './tmp/tflite_model.tflite'
with open(TFLITE_FILE_PATH, 'wb') as f:
    f.write(tflite_model)

interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

print("== Input details ==")
print("name:", interpreter.get_input_details()[0]['name'])
print("shape:", interpreter.get_input_details()[0]['shape'])
print("type:", interpreter.get_input_details()[0]['dtype'])

print("\n== Output details ==")
print("name:", interpreter.get_output_details()[0]['name'])
print("shape:", interpreter.get_output_details()[0]['shape'])
print("type:", interpreter.get_output_details()[0]['dtype'])

print("\nDUMP INPUT")
print(interpreter.get_input_details()[0])
print("\nDUMP OUTPUT")
print(interpreter.get_output_details()[0])

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

score = 0

input_shape = input_details[0]['shape']
for index, input_x in enumerate(test_x):
    input_data = np.array([input_x], dtype=np.float32)
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])[0]
    if np.argmax(output_data) == test_y[index]:
        score += 1
    else:
        pass
    score = score / len(test_y)
    print("\nTFlite set accuracy: %   .3f\n" % score)

`

erezalg commented 2 years ago

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

PaulZhangIsing commented 2 years ago

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

The code was implemented originally in MLFlow. This problem does not happen before. Acturally the logic is to run a shell script which call 3 python files. first one generates the json_files, and second one configure it well and call this script to run. Thanks @erezalg for your suggestion. I try to remove the json_files.json and read it later

erezalg commented 1 year ago

Hi @PaulZhangIsing,

was this issue solved?