allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.67k stars 654 forks source link

No automatical artifacts upload with tensorflow 2.13 #1112

Open Kaczmarekrr opened 1 year ago

Kaczmarekrr commented 1 year ago

Hi! First at all thank for a really good tool!

I have some troubles with logging with newer tensorflow version. On older version (tested on 2.8 but I think might be anything below 2.11) everything works, tf.keras.callbacks.ModelCheckpoint() is saving model and automatically the model is uploaded to clearml local server in our case.

When using older version I got in artifact "outputs models" and there a position for each of saved checkpoints. But with newer version (2.13) only uploads "variables file" which are not the whole model it is overwriting all over again and is not usable at all. The same goes with MODEL CONFIGURATION which in the older version is uploaded but here nothing happens. I already tested with every possible saving format - still do not work.

I noticed a warning about imports of tf saying that import need to be fixed.

WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.util has been moved to tensorflow.python.checkpoint.checkpoint. The old module will be deleted in version 2.11.

The function TrackableSaver that clearml is using was moved from tf.python.training.tracking.util to tf.python.training.checkpoint.checkpoint

I hoped that fixing import will make a job. I tried to do this but this is not enough. There is no more warning but still it do not log correctly

To reproduce

Run any training loop on newer tfversion and compere logging results to the older one.

For testing this issue I used basic tf classification tutorial with added code needed by clearml.

Environment

eugen-ajechiloae-clearml commented 1 year ago

Hi @Kaczmarekrr ! Thank you for letting us know. We will fix this ASAP.

niemiaszek commented 1 year ago

Thanks @eugen-ajechiloae-clearml, would help a lot. Keras introduced "Keras v3" format with .keras extension as recommended from TF2.13. Not sure if this is related to this issue, but would be nice if ClearML worked with both SavedModel and Keras v3.

eugen-ajechiloae-clearml commented 1 year ago

@niemiaszek Can you please post an example of "Keras v3"? We would like to look into it as well

niemiaszek commented 1 year ago

Sure @eugen-ajechiloae-clearml . It's introduced in 2.13 release as default format in place of SavedModel. It can be created according to an example in Keras documentation. Here is an output model generated from this example: example.keras.zip. I had to zip it to upload it directly here. Upon further inspection it contains 3 files: "config.json", "metadata.json" and "model.weights.h5"

pollfly commented 1 year ago

Hey @Kaczmarekrr! Just letting you know that this issue has been resolved in v1.13.0. Let us know if there are any issues :)

Kaczmarekrr commented 1 year ago

@pollfly Thanks for letting me know! Already tested. At this moment it works as intended. :))