AssertionError: lexicon free decoding can only be done with a unit language model

quhonglin commented 2 years ago

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

When I evaluate a CTC model on wav2cec2.0 according to fairseq/examples/wav2vec/README.md, I encountered the following error:

AssertionError: lexicon free decoding can only be done with a unit language model

Code

Here is the code I'm executing:

subset=test_clean
CUDA_VISIBLE_DEVICES=1 python /home/quhongling/fairseq-main/examples/speech_recognition/infer.py \
/Data/QuHonglin/datasets/wav2vec2/Librispeech/evaluate/100h \
--task audio_finetuning \
--nbest 1 --path /Data/QuHonglin/pre-trained-models/wav2vec_small_100h.pt \
--gen-subset $subset --results-path /Data/QuHonglin/experiments/wav2vec2/Librispeech/evaluate/100h/4-gram-lm/test_clean \
--w2l-decoder kenlm --lm-model /Data/QuHonglin/pre-trained-models/lm_librispeech_kenlm_word_4g_200kvocab.bin \
--lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter

And here is the error log:

Traceback (most recent call last):
  File "/home/quhongling/fairseq-main/examples/speech_recognition/infer.py", line 436, in <module>
    cli_main()
  File "/home/quhongling/fairseq-main/examples/speech_recognition/infer.py", line 432, in cli_main
    main(args)
  File "/home/quhongling/fairseq-main/examples/speech_recognition/infer.py", line 290, in main
    generator = build_generator(args)
  File "/home/quhongling/fairseq-main/examples/speech_recognition/infer.py", line 279, in build_generator
    return W2lKenLMDecoder(args, task.target_dictionary)
  File "/home/quhongling/fairseq-main/examples/speech_recognition/w2l_decoder.py", line 179, in __init__
    assert args.unit_lm, "lexicon free decoding can only be done with a unit language model"
AssertionError: lexicon free decoding can only be done with a unit language model

What have you tried?

When I try --w2l-decoder viterbi, it works fine. When I try to add --unit-lm or --unit-lm --kenlm-model=/Data/QuHonglin/pre-trained-models/lm_librispeech_kenlm_word_4g_200kvocab.bin, it can work, but the hypothesises are all the null, resulting in a wer of 100%. So how do I use a language model to decoding the Wac2vec2.0-CTC model correctly?

What's your environment?

fairseq Version: main
PyTorch Version: 1.8.1+cu101
OS : Linux ubuntu18.04
How you installed fairseq: source

Build command you used (if compiling from source):

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

Python version: 3.7
CUDA/cuDNN version: cuda10.1/cudnn-cuda10.1-8.0.5
GPU models and configuration:
Any other relevant information:

Abdullah955 commented 2 years ago

you need to specify your lexicon file within the command using --lexicon lexicon.txt in your case

subset=test_clean
CUDA_VISIBLE_DEVICES=1 python /home/quhongling/fairseq-main/examples/speech_recognition/infer.py \
/Data/QuHonglin/datasets/wav2vec2/Librispeech/evaluate/100h \
--task audio_finetuning \
--nbest 1 --path /Data/QuHonglin/pre-trained-models/wav2vec_small_100h.pt \
--gen-subset $subset --results-path /Data/QuHonglin/experiments/wav2vec2/Librispeech/evaluate/100h/4-gram-lm/test_clean \
--w2l-decoder kenlm --lm-model /Data/QuHonglin/pre-trained-models/lm_librispeech_kenlm_word_4g_200kvocab.bin \
--lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter --lexicon lexicon.txt

same lexicon.txt used in kenlm model it should be like this

EVERY E V E R Y |
WORD W O R D |
THAT T H A T |
EXISTS E X I S T S |
IN I N |
YOUR Y O U R |
LABEL L A B E L |
OR O R |
TRANSCRIPTION T R A N S C R I P T I O N |
FILE F I L E |
WILL W I L L |
WRITE W R I T E |
DOWN D O W N |
LIKE L I K E |
THIS T H I S |

quhonglin commented 2 years ago

@Abdullah955 Thanks for your reply. But if I want a lexicon free decoding, how should I do?

yangjiabupt commented 2 years ago

@quhonglin I think you should get a unit LM, which means that the model use char as unit to build

Abdullah955 commented 2 years ago

@quhonglin you need to create your own language model or use a pre-trained one using Kenlm this tutorial should help you,

https://huggingface.co/blog/wav2vec2-with-ngram

you only need text to train your model

quhonglin commented 2 years ago

Thanks for everyone. But now I no longer use it, and aslo forget some details for this issue. Maybe I'll try again when I have time in the future.

yangjiabupt commented 1 year ago

@Abdullah955 I follows the right steps. And the lexicon format is just as above. But when it goes to " self.lm = KenLM(cfg.lmpath, self.word_dict)" .

Segmentation fault (core dumped) occurs.

Can you help me with this? Thanks

yangjiabupt commented 1 year ago

The model is data2vec base model.

HSunshine99 commented 6 months ago

@quhonglin I have the same question: when I run command ,the hypothesises are all the null, resulting in a wer of 100%. Have you solved this problem, can you help me if you can, thanks a lot!

facebookresearch / fairseq