Open Kaczmarekrr opened 1 year ago
Hi @Kaczmarekrr ! Thank you for letting us know. We will fix this ASAP.
Thanks @eugen-ajechiloae-clearml, would help a lot. Keras introduced "Keras v3" format with .keras
extension as recommended from TF2.13. Not sure if this is related to this issue, but would be nice if ClearML worked with both SavedModel and Keras v3.
@niemiaszek Can you please post an example of "Keras v3"? We would like to look into it as well
Sure @eugen-ajechiloae-clearml . It's introduced in 2.13 release as default format in place of SavedModel. It can be created according to an example in Keras documentation. Here is an output model generated from this example: example.keras.zip. I had to zip it to upload it directly here. Upon further inspection it contains 3 files: "config.json", "metadata.json" and "model.weights.h5"
Hey @Kaczmarekrr! Just letting you know that this issue has been resolved in v1.13.0. Let us know if there are any issues :)
@pollfly Thanks for letting me know! Already tested. At this moment it works as intended. :))
Hi! First at all thank for a really good tool!
I have some troubles with logging with newer tensorflow version. On older version (tested on 2.8 but I think might be anything below 2.11) everything works, tf.keras.callbacks.ModelCheckpoint() is saving model and automatically the model is uploaded to clearml local server in our case.
When using older version I got in artifact "outputs models" and there a position for each of saved checkpoints. But with newer version (2.13) only uploads "variables file" which are not the whole model it is overwriting all over again and is not usable at all. The same goes with MODEL CONFIGURATION which in the older version is uploaded but here nothing happens. I already tested with every possible saving format - still do not work.
I noticed a warning about imports of tf saying that import need to be fixed.
WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.util has been moved to tensorflow.python.checkpoint.checkpoint. The old module will be deleted in version 2.11.
The function TrackableSaver that clearml is using was moved from tf.python.training.tracking.util to tf.python.training.checkpoint.checkpoint
I hoped that fixing import will make a job. I tried to do this but this is not enough. There is no more warning but still it do not log correctly
To reproduce
Run any training loop on newer tfversion and compere logging results to the older one.
For testing this issue I used basic tf classification tutorial with added code needed by clearml.
Environment