Open iamxiaoyubei opened 5 years ago
Hi @iamxiaoyubei ,
I have tried training on librispeech 960h, using the libri960h_example.yaml (with a much smaller batch size since I didn't have enough memory) without using rnn language model, and I got around 24% on dev clean and test clean.
By tweaking some of the parameters (especially sampling rate and number of lstm cells), I got it to 14%.
Note that I'm clearly not training the network on as many epoch as I should (I trained on ~15 epochs instead of 80 or 100), so maybe that's the reason why my WER is so high.
Small Erratum: While the training did not last as long as it should, the curve usually shows that the model has reached a point of stagnation, where it gets less than 0.05 wer every epoch.
Hello @Youyoun ,
Can you share the specific tweaks you made to the sampling rate and number of lstm cells? I would really appreciate it since I am about to train the model myself. Thanks a lot in advance!
Hey @miraodasilva !
Sorry for the delay. Well I basically tried to follow the model introduced in the SpecAugment paper, which is 4 LSTM layers with 1024 units each in the encoder, and 1 LSTM with 1024 units for the decoder.
If you're trying to follow the pyramidal structure of the encoder, then use 1 2 2 1.
Hello @Youyoun ,
Ok, thanks a lot for the info!
Hey @miraodasilva !
Sorry for the delay. Well I basically tried to follow the model introduced in the SpecAugment paper, which is 4 LSTM layers with 1024 units each in the encoder, and 1 LSTM with 1024 units for the decoder.
If you're trying to follow the pyramidal structure of the encoder, then use 1 2 2 1.
HI, can you share what sampling rate you used?
Has someone trained it on full librispeech train sets(train-clean-100, train-clean-360, train-other-500)? Could you tell the WER if training on them? Thank you!