Segmentation Fault after 2 line integration

PaulZhangIsing commented 2 years ago

Thank you for helping us making ClearML better!

Describe the bug

After adding clearml and clearml.task, I run the code as per-normal, and it happens to be like

I am using local clearml serve It does not happen this problem when I first install and run it It only happens after I have re-installed it, as first time I tried to add huge dataset to my local server and there is no disk space. I have to remove everything thre

To reproduce

My clearml server is run locally. after setting up server, do 2 line integration

Expected behaviour

What is the expected behaviour? What should've happened but didn't?

Environment

Server type (self hosted \ app.clear.ml) self hosted
ClearML SDK Version 1.6.4
ClearML Server Version (Only for self hosted). Can be found on the bottom right corner of the settings screen. WebApp: 1.6.0-213 • Server: 1.6.0-213 • API: 2.20
Python Version 3.8
OS (Windows \ Linux \ Macos) Linux
Related Discussion

If this continues a slack thread, please provide a link to the original slack thread.

jkhenning commented 2 years ago

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

PaulZhangIsing commented 2 years ago

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

Sure. The attached is the code:

`import json import numpy as np import os import tensorflow as tf from tensorflow import keras as keras import tensorflow.keras.initializers as tf_initializers import shutil

from clearml import Task

task = Task.init(project_name="PAS_ARADO", task_name="trial1")

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

def load_data(): fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() train_images = train_images / 255.0 test_images = test_images / 255.0 return train_images, train_labels, test_images, test_labels

def make_model(args): model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ])

model.compile(tf.keras.optimizers.Adam(learning_rate=args['learning_rate']),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

return model

def train(args, model, train_x, train_y):

model.fit(train_x, train_y, epochs=args['epochs'], batch_size=args['batch_size'])
return model

def evaluate(args, model, test_x, test_y): score = model.evaluate(x=test_x, y=test_y, batch_size=args['test_batch_size']) return score

def run_and_log(json_filename='json_data.json'): with open(json_filename) as json_file: args = json.load(json_file)

print(args)

tf.compat.v1.random.set_random_seed(args['seed'])

train_x, train_y, test_x, test_y = load_data()

model = make_model(args)

trained_model = train(args, model, train_x, train_y)
tf.saved_model.save(trained_model, "./tmp/saved_model.h5")
results = evaluate(args, trained_model, test_x, test_y)

print("\nTest set accuracy: %   .3f\n" % results[1])

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
TFLITE_FILE_PATH = './tmp/tflite_model.tflite'
with open(TFLITE_FILE_PATH, 'wb') as f:
    f.write(tflite_model)

interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

print("== Input details ==")
print("name:", interpreter.get_input_details()[0]['name'])
print("shape:", interpreter.get_input_details()[0]['shape'])
print("type:", interpreter.get_input_details()[0]['dtype'])

print("\n== Output details ==")
print("name:", interpreter.get_output_details()[0]['name'])
print("shape:", interpreter.get_output_details()[0]['shape'])
print("type:", interpreter.get_output_details()[0]['dtype'])

print("\nDUMP INPUT")
print(interpreter.get_input_details()[0])
print("\nDUMP OUTPUT")
print(interpreter.get_output_details()[0])

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

score = 0

input_shape = input_details[0]['shape']
for index, input_x in enumerate(test_x):
    input_data = np.array([input_x], dtype=np.float32)
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])[0]
    if np.argmax(output_data) == test_y[index]:
        score += 1
    else:
        pass
    score = score / len(test_y)
    print("\nTFlite set accuracy: %   .3f\n" % score)

`

erezalg commented 2 years ago

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

PaulZhangIsing commented 2 years ago

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

The code was implemented originally in MLFlow. This problem does not happen before. Acturally the logic is to run a shell script which call 3 python files. first one generates the json_files, and second one configure it well and call this script to run. Thanks @erezalg for your suggestion. I try to remove the json_files.json and read it later

erezalg commented 1 year ago

Hi @PaulZhangIsing,

was this issue solved?

allegroai / clearml