flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.35k stars 1.02k forks source link

[What is good learning rate and batchsize for training conv_glu (wav2letter) 2016? ] #832

Open phamvandan opened 3 years ago

phamvandan commented 3 years ago

I have trained conv_glu (wav2letter) 2016 with feature extracted from wav2vec model. I choose the learning rate = 1.0 and batchsize = 36 with dataset over 500 hours voice audio. But WER didn't converge, so what is good learning rate and batchsize for training conv_glu (wav2letter) 2016 with feature extracted from wav2vec model.

padentomasello commented 3 years ago

Hi @phamvandan, that's a tough question to answer without running some experiments :). What optimizer are you using? Is the training loss not converging either?

If you're using SGD, I would run experiments lowering its learning rate by factors of 10, so 0.1, 0.01, 0.001, 0.0001 .... Also, from my experience, Adam optimizer with the default parameters is a good place to start with new experimentation.

tlikhomanenko commented 3 years ago

One more thing: are you fine tuning the wav2vec features with the whole net or not? First start with frozen wav2vec features.

phamvandan commented 3 years ago

I refered from Mr Mai Long, who reproduced wav2vec features as input https://github.com/mailong25/vietnamese-speech-recognition

tlikhomanenko commented 3 years ago

I think better to ask directly Mr Mai Long how he reproduced then. As far as I know in original paper they use frozen wav2vec features.