NervanaSystems / deepspeech

DeepSpeech neon implementation
Apache License 2.0
222 stars 69 forks source link

How many epochs does DeepSpeech2 need to converge on LibriSpeech #42

Closed Some-random closed 7 years ago

Some-random commented 7 years ago

The test other CER on LibriSpeech reported in DeepSpeech2 paper was 0.1325. I'm wondering has anyone ever come close to this number? And if so, how many epochs do you need to get there?

Neuroschemata commented 7 years ago

The DS2 paper reported training a model on ~12k hours of speech data (trained in under 25 epochs), and then testing on various benchmark datasets, eg. Librispeech. So one wouldn't expect to get similar performance if one trained on a much smaller dataset.

beatthem commented 7 years ago

It was said at https://github.com/NervanaSystems/deepspeech (README.md) "We have used this code to train models on both the Wall Street Journal (81 hours) and Librispeech (1000 hours) datasets." And results shown in table below this phrase look promising. How much CER at Wall Street Journal was? Does this mean, that training on audiobooks with less lexicon needs less hours to reach lower error rate?

Some-random commented 7 years ago

@Neuroschemata Yeah you're right. I just realized the DS2 training set is a lot larger.

Neuroschemata commented 7 years ago

In general, you can always expect to get better WER with larger datasets as these models tend to learn an implicit language model. The CER for WSJ was about 8.7%. Keep in mind that WSJ and Librispeech are not representative of the sources of speech that one finds in the "wild".