Closed bigdatabaracus closed 7 years ago
The training times for these types of models vary depending on the distribution of utterance lengths in the training dataset. The Librispeech dataset has relatively long utterances compared to, say, the WSJ dataset distributed by the LDC.
Training the model on 1000 hours of speech data from the Librispeech corpus was carried out using Maxwell Titan X GPUs. Training on a single GPU takes about 141 hours (roughly 6 days). The best performing model required 16 epochs.
Neon does indeed support training on multiple GPUs. Training the model on 4 GPUs takes about 46 hours. However, the multi-GPU setup is not available as part of our open source release. It is available as part of the Nervana Cloud platform which offers enterprise-grade deep learning solutions.
Hi,
the readme says that the model can take up to one week of training to see respectable performance. Would it be possible to add a mention on with what type of GPU setup one can expect to see these types of training times? Also does Neon/deep speech support a multi GPU setup and if so how to enable it for training the deep speech model? Similarly would it be possible to describe the GPU setup and training times in generating the pretrained model (1000 hours of speech, 16 epochs and a CER of 14%) as a reference to better understand the computational resource requirements when training a similar model from scratch?
Really cool that you have open sources your deep speech implementation. My fingers are itching to try out Neon :) Thank you.