NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

having trouble reproducing DS2 result #327

Closed byan23 closed 5 years ago

byan23 commented 5 years ago

I was trying to reproduce the claimed result for DS2 but failed... I was able to get to 9.0 (WER) with 8 gpus after 50 epochs (108k iters) env detail: 8 * 1080 Ti gpus. I used the original config file ds2_large_8gpus_mp.py.

I noticed that https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/deepspeech2.html mentioned that I need to train for 200 epochs which conflicts with the config file. Can I get a 6.71 WER with --num_epochs=200 using ds2_large_8gpus_mp.py?

I've also noticed that the checkpoint provided in the above link is a result after only 54k iters. Assuming running on a single node with 8 gpus, batch_size_per_gpu=16 (based on ds2_large_8gpus_mp.py). I did some quick math here: 54k 8 16 / 280k utterances = 24 epochs. To match the claimed 200 epochs, sounds like the checkpoint is actually a result of 8 node (200 / 24) ..?

These don't sound right to me. Could any of you help out on this? Thanks.

borisgin commented 5 years ago

DS2 was trained on 1 node with 8 GPU (V100x16GB). It took 200 epochs (you can change it in config file or overwrite from command line). Please note that 1080 Ti don’t support mixed precision in full, so I would suggest to use float32 precision on these cards

byan23 commented 5 years ago

Thanks Boris. Should I expect to get the same WER using float32?

borisgin commented 5 years ago

Yes, WER should be the same

byan23 commented 5 years ago

I am able to reproduce the result. Thanks for the help!