Closed dathudeptrai closed 3 years ago
sigle_gpu got the same problem
sigle_gpu got the same problem
Disble mixed precision will Fix ur problem.
thanks
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
I just found that
tf.keras.layers.experimental.SyncBatchNormalization
in multi-gpu can yieldnan
loss sometimes. There are some issue already in github (https://github.com/tensorflow/tensorflow/issues/41980). The workaround right now is simply remove postnet in FastSpeech/FastSpeech2 (if you get nan before the model convergence :D, in my case, i got nan after model convergence so everything fine :D), it doesn't hurt the performance :D. We will make the base_trainer handle the nan loss later :D.