NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

Gradient overflow with Mixed Precision Training #63

Open MinHyung-Kang opened 4 years ago

MinHyung-Kang commented 4 years ago

Hello! I am trying to train on train-clean-100 subset of LibriTTS. I have resampled all of them to be 22khz (example), and reusing the filelists file as was provided in original repository. I was able to successfully run them for about 15000 iterations on a T4 GPU. (g4dn.xlarge instance on AWS)

When I tried turning on mixed precision training by setting tp16_run=True on the same instance, it runs for a few iterations, then runs into gradient overflows. It keeps trying to decrease the loss scale by 2 until ~1e-100 (at which point i stopped). The loss is NaN rather than inf, which according to Apex github issue I should not be observing.

Wondering if anyone has an idea why this might be happening. image

+I am also wondering how many iterations the uploaded models for LibriTTS and LJS was trained for - is that some information that the team could share?

Thank you in advance!

richardburleigh commented 4 years ago

Try this pull request: #15