Closed BaPannier closed 2 years ago
What is your question? How to reproduce the WER improvement obtained by using the proposed Transformer LM instead of Viterbi ?
What have you tried? Files used:
- Letter dictionary: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
- Wav2vec model: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
- Transformer LM: here from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019
- LM dict: here + upper-case processing from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019 (
dict.txt
placed in the same directory thanlm_librispeech_word_transformer.pt
)head -3 dict.txt THE 49059384 AND 26362574 OF 24795903
Command used:
python examples/speech_recognition/infer.py /path/to/librispeech --task audio_pretraining --nbest 1 --path /path/to/wav2vec2_vox_960h.pt --gen-subset dev_clean --results-path outputdir --w2l-decoder fairseqlm --lm-model /path/to/lm_librispeech_word_transformer.pt --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000
This produces a WER > 50 while using Viterbi gives ~2 WER. When using this lexicon file (from #2734) by adding this argument
--lexicon /path/to/librispeech_lexicon.lst
, I get a ~6 WER.What’s your environment?
- fairseq 0.10.0 (latest stable release)
- wav2letter branch v0.2 for python bindings + patch from this issue facebookresearch/wav2letter#775 (otherwise imports from w2l_decoder.py will fail due to missing LexiconFreeDecoder)
I don’t know what I did wrong. Thank you fro your answer !
Did you get any solution for this? @BaPannier
Please Replace 416, 417 lines with below. if you done, It may be work.
examples/speech_recognition/w2l_decoder.py
414 word_idx = self.worddict.index(word)
415 , score = self.lm.score(start_state, word_idx, no_cache=True)
416 for spelling in [list(word+"|")]:
417 if word != "
Hi, To recreate the results, I noticed that LM weight, word insertion penalty and beam size also play an important role. They have used a variety of values based on the finetune data, transformer/Ken LM and the set they are decoding. Please refer to the paper, the ablations section will have the values they used for different experiments. I had followed this for the 1hr BASE finetuned model with both 4-gram KENLM and transformer LM I got their results. With LM weight 2 and Word Insertion Penalty, I was getting around 10-15% more .
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
What is your question? How to reproduce the WER improvement obtained by using the proposed Transformer LM instead of Viterbi ?
What have you tried? Files used:
Letter dictionary: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Wav2vec model: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Transformer LM: here from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019
LM dict: here + upper-case processing from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019 (
dict.txt
placed in the same directory thanlm_librispeech_word_transformer.pt
)Command used:
python examples/speech_recognition/infer.py /path/to/librispeech --task audio_pretraining --nbest 1 --path /path/to/wav2vec2_vox_960h.pt --gen-subset dev_clean --results-path outputdir --w2l-decoder fairseqlm --lm-model /path/to/lm_librispeech_word_transformer.pt --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000
This produces a WER > 50 while using Viterbi gives ~2 WER. When using this lexicon file (from https://github.com/pytorch/fairseq/issues/2734) by adding this argument
--lexicon /path/to/librispeech_lexicon.lst
, I get a ~6 WER.What’s your environment?
fairseq 0.10.0 (latest stable release)
wav2letter branch v0.2 for python bindings + patch from this issue https://github.com/facebookresearch/wav2letter/issues/775 (otherwise imports from w2l_decoder.py will fail due to missing LexiconFreeDecoder)
I don’t know what I did wrong. Thank you fro your answer !