Alexander-H-Liu / End-to-end-ASR-Pytorch

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
MIT License
1.18k stars 317 forks source link

Config file for Librispeech 960h #37

Open Youyoun opened 5 years ago

Youyoun commented 5 years ago

Hi,

Does anyone have a config file that works wonders for training the ASR model on librispeech 960h ? I can't seem to get it to the ~4% WER promised by many research papers. My best so far is above 10%. Clearly with the tools provided by this repository, there must be a way to reach that much WER.

Edresson commented 4 years ago

@Youyoun Were you able to improve your WER?

Youyoun commented 4 years ago

Hey,

It's been a while since I worked on speech to text. From memory the best WER I ever reached was around 7%, but I don't remember the exact model parameters. I based it off mainly on the SpecAugment paper.

What really helped take down that WER below 10% was bigger batches. For that to work I used gradient accumulation (I think that my batch size was 32 on 1 GPUs, and with accum grad I took it to 512). Pretty easy to implement.

It took me 2 weeks to train on single GPU.

Hope this helps.