NervanaSystems / deepspeech

DeepSpeech neon implementation
Apache License 2.0
222 stars 69 forks source link

Training Time - Architecture #36

Closed adarshupadhyay43 closed 7 years ago

adarshupadhyay43 commented 7 years ago

Is the training time given (6 days for 1000 hours of Librispeech data on a single GPU) for a single epoch or 16 epochs? How does the training time vary with no. of hours of training set on a single GPU system, given the assumption that the length of utterances is similar to that of Librispeech? (say for 3000 hours; 5000 hours) If I choose to train on a new dataset, is there some way I can build upon the given pre-trained model and train on the new data or do I need to build a whole training set (including Librispeech) and start training from scratch?

Neuroschemata commented 7 years ago

Refer to #5 for a discussion about training times.

Determining the scaling of training times with the distribution of utterance lengths is not as straightforward to determine, but a good initial guess is to assume that the training time will scale at least linearly with the longest utterance in your dataset. However, you should also keep in mind that the memory footprint of the model will scale almost linearly with the length of the longest utterance in the dataset, in which case you'd have to take into account the issues discussed in #10.

If you choose to use transfer learning, you could use the pre-trained model as a starting point (see https://www.intelnervana.com/transfer-learning-using-neon/ for an example of how you might go about doing this).

adarshupadhyay43 commented 7 years ago

Thanks a lot. It is great help