[WARNING] Nan problem in PostNet of FastSpeech/FastSpeech2 when using Multi-GPU.

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

https://tensorspeech.github.io/TensorFlowTTS/

Apache License 2.0

3.85k stars 815 forks source link

[WARNING] Nan problem in PostNet of FastSpeech/FastSpeech2 when using Multi-GPU. #187

Closed dathudeptrai closed 3 years ago

dathudeptrai commented 4 years ago

I just found that tf.keras.layers.experimental.SyncBatchNormalization in multi-gpu can yield nan loss sometimes. There are some issue already in github (https://github.com/tensorflow/tensorflow/issues/41980). The workaround right now is simply remove postnet in FastSpeech/FastSpeech2 (if you get nan before the model convergence :D, in my case, i got nan after model convergence so everything fine :D), it doesn't hurt the performance :D. We will make the base_trainer handle the nan loss later :D.

jucaowei commented 4 years ago

sigle_gpu got the same problem

dathudeptrai commented 4 years ago

sigle_gpu got the same problem

Disble mixed precision will Fix ur problem.

jucaowei commented 4 years ago

thanks

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.