HenriquesLab / ZeroCostDL4Mic

ZeroCostDL4Mic: A Google Colab based no-cost toolbox to explore Deep-Learning in Microscopy
MIT License
566 stars 130 forks source link

CARE_2D training termination #261

Closed BelES123 closed 1 year ago

BelES123 commented 1 year ago

Hello, I encountered this problem in step 4.2. The training does not proceed through all 200 epochs and is being terminated prematurely. Here is a message that I get. The previous steps did not give me any warnings or errors.

Epoch 1/200 WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.. at 0x7f953ef3b310> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f953ef3b310>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING: AutoGraph could not transform <function _mean_or_not.. at 0x7f953ef3b310> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f953ef3b310>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.. at 0x7f94b0114af0> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f94b0114af0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING: AutoGraph could not transform <function _mean_or_not.. at 0x7f94b0114af0> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f94b0114af0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.. at 0x7f94b0114dc0> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f94b0114dc0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING: AutoGraph could not transform <function _mean_or_not.. at 0x7f94b0114dc0> and will run it as-is. Cause: could not parse the source code of <function _mean_or_not.. at 0x7f94b0114dc0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names. The matching definitions were: Match 0: lambda x: K.mean(x, axis=-1)

Match 1: lambda x: x

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert 2/293 [..............................] - ETA: 56s - loss: 22278137476229890048.0000 - mse: inf - mae: 20077735433096658944.0000 Batch 2: Invalid loss, terminating training 3/3 [==============================] - 1s 7ms/step 293/293 [==============================] - 39s 42ms/step - loss: nan - mse: inf - mae: 13385156588893896704.0000 - val_loss: nan - val_mse: nan - val_mae: nan - lr: 4.0000e-04

Loading network weights from 'weights_last.h5'. Training, done. Time elapsed: 0.0 hour(s) 0.0 min(s) 49 sec(s)

BelES123 commented 1 year ago

I figured that out and was able to train my model. Training termination was because of high loss, which was caused by some images in the training set that had a very little number of cells.

guijacquemet commented 1 year ago

Hi @BelES123 , Thanks for reaching out and apologize for the slow answer. Glad you found out what your issue was! Cheers Guillaume