wav2vec pretained model inference with fairseq and wav2letter python bindings fails

taiman9 commented 3 years ago

🐛 Bug

I am trying to obtain inference for sample flac files (in a directory) using the 'infer.py' script in fairseq.

To Reproduce

I installed fairseq using the following commands:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

I installed wav2letter python dependencies as instructed in: wav2letter python bindings

I installed wav2letter python bindings with the for the CUDA dependency (with env variable USE_CUDA=1) with the following commands after installing its dependencies:

cd wav2letter/bindings/python
pip install -e .

Then I ran the following command to obtain the manifest of my sample flac files in fairseq: $ python examples/wav2vec/wav2vec_manifest.py /path/to/flacs --dest /manifest/path --ext flac --valid-percent 0

This above command was to obtain the manifest of the flac files (10 to 30 secs in length each) in a directory called dev-other. It created a .tsv file for my flacs in the manifest path which I named dev-other.tsv. I also included the letter dictionary file dict.ltr.txt in the manifest path.

Then I ran inference using the Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) pre-trained model from: https://github.com/pytorch/fairseq/tree/master/examples/wav2vec

I ran inference in fairseq using the following command:

python examples/speech_recognition/infer.py /path/to/manifest/ --task audio_pretraining --nbest 1 --path /path/to/w2v_large_lv_fsh_swbd_cv.pt --gen-subset dev-other --results-path /path/to/dev-results --w2l-decoder viterbi --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000

I got the following error message upon running the inference command:

INFO:__main__:| decoding with criterion ctc
INFO:__main__:| loading model(s) from /home/tsiddiqui/data/w2v_large_lv_fsh_swbd_cv.pt
INFO:fairseq.data.audio.raw_audio_dataset:loaded 17, skipped 0 samples
INFO:__main__:| /home/tsiddiqui/manifest/ dev-other 17 examples
Traceback (most recent call last):                                              
  File "examples/speech_recognition/infer.py", line 428, in <module>
    cli_main()
  File "examples/speech_recognition/infer.py", line 424, in cli_main
    main(args)
  File "examples/speech_recognition/infer.py", line 349, in main
    hypos = task.inference_step(generator, models, sample, prefix_tokens)
  File "/home/tsiddiqui/fairseq/fairseq/tasks/fairseq_task.py", line 456, in inference_step
    models, sample, prefix_tokens=prefix_tokens, constraints=constraints
  File "/home/tsiddiqui/fairseq/examples/speech_recognition/w2l_decoder.py", line 79, in generate
    emissions = self.get_emissions(models, encoder_input)
  File "/home/tsiddiqui/fairseq/examples/speech_recognition/w2l_decoder.py", line 87, in get_emissions
    emissions = models[0].get_normalized_probs(encoder_out, log_probs=True)
  File "/home/tsiddiqui/fairseq/fairseq/models/fairseq_model.py", line 62, in get_normalized_probs
    return self.get_normalized_probs_scriptable(net_output, log_probs, sample)
  File "/home/tsiddiqui/fairseq/fairseq/models/fairseq_model.py", line 85, in get_normalized_probs_scriptable
    raise NotImplementedError
NotImplementedError

Could someone please explain what is causing this error message and how to resolve it?

PS. - I also tried running inference using a language model but got similar error message. If you could provide the full inference command to run inference using the kenlm or fairseqlm model, it would be appreciated.

Environment

fairseq Version: master
PyTorch Version: 1.9.0+cu102
OS (e.g., Linux): Ubuntu 18.04
How you installed fairseq (pip, source): pip
Python version: 3.6.9
CUDA/cuDNN version: 10.1
GPU models and configuration: NVIDIA GeForce RTX 2080 Ti/NVIDIA TITAN RTX
Any other relevant information:

Help to fix the issue would be highly appreciated! I need to try to resolve this as soon as possible!

jubick1337 commented 3 years ago

Hi @taiman9 I feel like this model isn't fine-tuned one. Could you try to use fine-tuned one?

taiman9 commented 3 years ago

Hi @jubick1337,

Thanks for replying. I just ran one of the finetuned models using the following command:

python examples/speech_recognition/infer.py /path/to/manifest/ --task audio_pretraining --nbest 1 --path /path/to/wav2vec2_vox_960h_new.pt --gen-subset dev-other --results-path /path/to/dev-results --w2l-decoder viterbi --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 --post-process letter

Then I got the following error message:

Traceback (most recent call last):
  File "examples/speech_recognition/infer.py", line 428, in <module>
    cli_main()
  File "examples/speech_recognition/infer.py", line 424, in cli_main
    main(args)
  File "examples/speech_recognition/infer.py", line 237, in main
    state=model_state,
  File "/home/tsiddiqui/fairseq/fairseq/checkpoint_utils.py", line 269, in load_model_ensemble
    state,
  File "/home/tsiddiqui/fairseq/fairseq/checkpoint_utils.py", line 304, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/home/tsiddiqui/fairseq/fairseq/checkpoint_utils.py", line 238, in load_checkpoint_to_cpu
    state = _upgrade_state_dict(state)
  File "/home/tsiddiqui/fairseq/fairseq/checkpoint_utils.py", line 493, in _upgrade_state_dict
    state["cfg"] = convert_namespace_to_omegaconf(state["args"])
  File "/home/tsiddiqui/fairseq/fairseq/dataclass/utils.py", line 351, in convert_namespace_to_omegaconf
    composed_cfg = compose("config", overrides=overrides, strict=False)
TypeError: compose() got an unexpected keyword argument 'strict'

Would you know what the issue is?

jubick1337 commented 3 years ago

I don't know exactly but seems like building model error Try previous versions of fairseq as fairseq have changed configs/builders a lot

facebookresearch / fairseq