flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Missing dict.lst #13

Closed benob closed 6 years ago

benob commented 6 years ago

The dict.lst file required for rescoring the librispeech output with a language model seems to be missing from the repository.

I tried to recreate one with the following command zcat 3-gram.pruned.3e-7.arpa.gz | perl -ne 'chomp;$_=lc;@a=split /\t/;if(/^\\1-grams:/.../^$/){$w=$a[1]; $w=~s/(.)(\1+)/$1.length($2)/e; print "$a[1] $w\n"}' | grep -v "<\|^ *$\|[3-9]" > dict.lst

But I get a WER of 6.73 on dev-clean after rescoring. I would have expected something in the 4-5% as reported in the paper.

luajit ./wav2letter/decode.lua ./models/ dev-clean -show -letters ./data/librispeech-proc/letters-rep.lst  -words ./dict.lst -lm ./models/3-gram.pruned.3e-7.bin -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show
...
[Memory usage: 411.62 Mb]
[Decoded 2703 sequences in 3258.00 s (actual: 29223.70 s)]
[WER on dev-clean = 6.73%, LER = 2.42%]
VitaliyLi commented 6 years ago

Hi,

dict.lst is produced by ~/wav2letter/data/utils/convert-arpa.lua script (README describes how to run it).

WER reported in the paper is based on 4-gram language model. The following parameters should produce 4.3% WER on librispeech dev-clean.

-lmweight 3.1639 -silweight -0.37491 -beamsize 25000 -beamscore 40