bshall / hifigan

An 16kHz implementation of HiFi-GAN for soft-vc.
https://bshall.github.io/soft-vc/
MIT License
91 stars 23 forks source link

NaN during training when using own dataset #4

Open cjay42 opened 1 year ago

cjay42 commented 1 year ago

While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point. The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:

https://github.com/bshall/hifigan/blob/374a4569eae5437e2c80d27790ff6fede9fc1c46/hifigan/dataset.py#L106

I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:

wav = flip * gain * wav / max([wav.abs().max(), 0.001])
joan126 commented 1 year ago

met same issue with you!!