Speech recognition reproducibility

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.63k stars 6.42k forks source link

Speech recognition reproducibility #1139

Open Bobrosoft98 opened 5 years ago

Bobrosoft98 commented 5 years ago

Hi,

I am having trouble reproducing the speech recognition results. With the default settings, the model stagnates at 25% train accuracy. By employing a different optimizer, increasing the batch size and tuning the lr, I was able to reach 8% WER, but that is far from the claimed 5% without tuning.

Could you please provide additional info about your configuration (the model and number of GPUs, the total batch size), or even better: logs and/or model checkpoints?

Thank you.

huihuifan commented 5 years ago

@okhonko

carlosep93 commented 5 years ago

Hi,

I'm having similar results on 1 GPU for a different dataset. Could you share with us the parameters you used to improve the results?

Thank you

alexbie98 commented 5 years ago

hi, i was having similar issues but was able to do better with the default settings on one gpu by simulating the larger batch size with --update-freq 16

Bobrosoft98 commented 5 years ago

@alexbie98 I actually used this parameter when training on 1 GPU, and it didn't help. Can you elaborate on "do better"? Did you replicate the paper's WER?

@carlosep93 My parameters were: --optimizer adam --lr 5e-4 --fp16 --memory-efficient-fp16 --warmup-updates 2500 --update-freq 4

I also changed the batching logic to pack as much data on each GPU as possible, resulting in the average batch size 670 for all 8 GPUs. Only after that it started properly training.

alexbie98 commented 5 years ago

right now it's at 96% train acc/91.7% valid acc after training for 5 days (epoch 31). Have not yet matched the reported WER, getting 9.9 on the current checkpoint. The loss/acc plateaus for a bit before dropping quite low.

https://i.imgur.com/XBL1TZo.png

Bobrosoft98 commented 5 years ago

Wow, that looks nice! What batch size do you have? Also, could you share the accuracy plot?

alexbie98 commented 5 years ago

https://i.imgur.com/dKadcXq.png

The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16

Bobrosoft98 commented 5 years ago

Thanks for providing the plot! Are you sure about 80k? I think, the whole librispeech train set has around 200k utterances, which means 3 batches per epoch in your case.

alexbie98 commented 5 years ago

sorry 80k tokens*, using the default command's --max-tokens 5000 with --update-freq 16, the average number of sentences is around 60

edosyhptra commented 3 years ago

https://i.imgur.com/dKadcXq.png

The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16 sorry for oot reply,

Could you share how do you plot the training accuracy?

alexbie98 commented 3 years ago

https://i.imgur.com/dKadcXq.png The effective batch size is 80k. My training command is the same as the one in the repo with --update-freq 16 sorry for oot reply,

Could you share how do you plot the training accuracy?

If I recall correctly, specifying a directory to --tensorboard-logdir will generate these plots viewable from tensorboard. I haven't used this a while though.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!