Missing dict.lst - Githubissues

The dict.lst file required for rescoring the librispeech output with a language model seems to be missing from the repository.

I tried to recreate one with the following command zcat 3-gram.pruned.3e-7.arpa.gz | perl -ne 'chomp;$_=lc;@a=split /\t/;if(/^\\1-grams:/.../^$/){$w=$a[1]; $w=~s/(.)(\1+)/$1.length($2)/e; print "$a[1] $w\n"}' | grep -v "<\|^ *$\|[3-9]" > dict.lst

But I get a WER of 6.73 on dev-clean after rescoring. I would have expected something in the 4-5% as reported in the paper.

luajit ./wav2letter/decode.lua ./models/ dev-clean -show -letters ./data/librispeech-proc/letters-rep.lst  -words ./dict.lst -lm ./models/3-gram.pruned.3e-7.bin -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show
...
[Memory usage: 411.62 Mb]
[Decoded 2703 sequences in 3258.00 s (actual: 29223.70 s)]
[WER on dev-clean = 6.73%, LER = 2.42%]

flashlight / wav2letter

Missing dict.lst #13