deepsphere / deepsphere-cosmo-tf2

A spherical convolutional neural network for cosmology (TFv2).
https://arxiv.org/abs/1810.12186
MIT License
19 stars 5 forks source link

Error saving model #3

Closed JavierOrjuela closed 2 years ago

JavierOrjuela commented 2 years ago

Hello guys, After training a custom deep-sphere model (based on ResidualLayer+HealpyChebyshev), I decided to save it in keras as usual, via callbacks: callback_model= tf.keras.callbacks.ModelCheckpoint( filepath=filepath+"cp-{epoch:04d}.ckpt", monitor="val_loss", verbose=1, save_best_only=True, save_weights_only=False, mode="min") However, I've got this error:

~/anaconda3/lib/python3.8/site-packages/keras/saving/saved_model/save_impl.py in call_and_return_conditional_lossess(*args, kwargs) 631 def call_and_return_conditional_losses(*args, *kwargs): 632 """Returns layer (call_output, conditional losses) tuple.""" --> 633 call_output = layer_call(args, kwargs) 634 if version_utils.is_v1_layer_or_model(layer): 635 conditional_losses = layer.get_losses_for(

TypeError: call() got multiple values for argument 'training'

Then, I tried to save the model located in the examples folder: "quick_start.ipynb", and also I obtained the same error!! Models cannot be saved with model.save_weights(checkpoint_path) either. I wonder if this is a problem from my environment, or this is due to the custom layers defined in the repository. I really appreciate your comments. Best, Javier

jafluri commented 2 years ago

Dear Javier

Thank you very much for your interest in DeepSphere for TF 2.x

I was able to reproduce the error regarding model.save, I think that there was an issue with the subclassing. However, I did not manage to reproduce the same error with model.save_weights .

Anyway, I pushed a potential fix, can you check if a8211be fixed your issue?

Best,

Janis

JavierOrjuela commented 2 years ago

Hello Janis, Thank you for your quick reply. Unfortunately, I'm still having a similar error:

10/10 [==============================] - ETA: 0s - loss: 1.1198 - sparse_categorical_accuracy: 0.6933 Epoch 1: val_loss improved from inf to 0.69315, saving model to my_model/cp-0001.ckpt /usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.traceback) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb

/usr/lib/python3.7/contextlib.py in exit(self, type, value, traceback) 117 if type is None: 118 try: --> 119 next(self.gen) 120 except StopIteration: 121 return False

TypeError: chebyshev_layer_call_and_return_conditional_losses(input_tensor, training) got two values for 'training'.

In the case of using model.save_weights, I've got the error when loading: model.load_weights(latest). I am running in a local conda env. and Google-colab, but in both environments I got the same error.

jafluri commented 2 years ago

I see, thanks for the report.

Which version of TensorFlow are you running?

jafluri commented 2 years ago

Anyway, I tried something new in 555e28e, let me know if it worked.

JavierOrjuela commented 2 years ago

It works now!! I have tested this new version with model.save, model.save_weights along with model.load&model.load_weights, and everything seems to be working. Thank you very much @jafluri for your support. Best, Javier

JavierOrjuela commented 2 years ago

I see, thanks for the report.

Which version of TensorFlow are you running?

TF. 2.8.0