NervanaSystems / deepspeech

DeepSpeech neon implementation
Apache License 2.0
222 stars 69 forks source link

Sampling Rate Mismatch leading to nan cost #23

Closed Laqshay closed 7 years ago

Laqshay commented 7 years ago

I have compiled a training dataset comprised of roughly 3 million .flac files amounting to 2400 hours of training data. They are 16-bit mono channel. However, they have a sampling rate of 22.05 kHz. When I started training, the model quickly went to nan cost. Is it possible that the cost diverged due to incorrect sampling rate (Is it compulsory for the files to have a sampling rate of 16 kHz)? If so, is there a program/tool to change the sampling rate that can be called from the command line?

tyler-nervana commented 7 years ago

We haven't tried 22.05 kHz ourselves, though I see no reason why it shouldn't work. Aeon supports any sample rate, but you have to specify that in the dataloader config dictionary. Could you post your dataloader configuration? It would be really useful to help figure this out.