facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.19k stars 6.38k forks source link

Unable to do inference #3994

Open pratikkumar018 opened 2 years ago

pratikkumar018 commented 2 years ago

I am unable to do inference. I am running the command : python infer.py dataset_dec_dev-other/ --task audio_pretraining --nbest 1 --path wav2vec_vox_960h_pl.pt --gen-subset train --results-path results_decode_dev-other --w2l-decoder viterbi --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 --post-process letter

The command gives and error self.criterion_type = CriterionType.CTC NameError: name 'CriterionType' is not defined

Also the image shows the detailed error. Is it necessary to have flashlight to run decoding?

image

chen7515 commented 2 years ago

I think you need to install flashlight python bindings to do inference, even if just extracting raw numbers with the viterbi option. The w2l-decoder is from the wav2letter repo which was merged into flashlight python bindings. You have a warning in your screenshot requesting that you install it.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

202015004 commented 2 years ago

Unable to perform Evaluation / Inference in wav2vec2.0 model

I'm still facing this issue. I have tried several techniques. Major attempts are listed below Installation Setup done using following link setup :

1. Using STT

CODE from stt import Transcriber transcriber = Transcriber( pretrain_model = 'baseline_trial/Pre-Trained_model/wav2vec_small.pt', finetune_model='outputs/2022-02-27/11-29-48/checkpoints/checkpoint_best.pt', dictionary = 'baseline_trial/dictionary/dict.ltr.txt', lm_type = 'kenlm', lm_lexicon = 'lm/lexicon.txt', lm_model = 'lm/lm.bin',lm_weight = 1.5, word_score = -1, beam_size = 50) hypos = transcriber.transcribe(['/home/speechlab/Desktop/Shreya/TLT_2021/ETLT2021_CAMBRIDGE_EN_baseline/ETLT2021_ETS_EN/audio/dev/1000000000018212-VE648280.wav','/home/speechlab/Desktop/Shreya/TLT_2021/ETLT2021_CAMBRIDGE_EN_baseline/ETLT2021_ETS_EN/audio/dev/1000000000018212-VE654794.wav']) print(hypos)

ERROR /home/speechlab/self-supervised-speech-recognition/libs/fairseq/examples/speech_recognition/w2l_decoder.py:42: UserWarning: wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings "wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings" Traceback (most recent call last): File "/home/speechlab/self-supervised-speech-recognition/testing1.py", line 12, in transcriber = Transcriber( pretrain_model = 'baseline_trial/Pre-Trained_model/wav2vec_small.pt', finetune_model='outputs/2022-02-27/11-29-48/checkpoints/checkpoint_best.pt', dictionary = 'baseline_trial/dictionary/dict.ltr.txt', lm_type = 'kenlm', lm_lexicon = 'lm/lexicon.txt', lm_model = 'lm/lm.bin',lm_weight = 1.5, word_score = -1, beam_size = 50) File "/home/speechlab/self-supervised-speech-recognition/stt.py", line 254, in init self.transcribe([sample_audio_path]) File "/home/speechlab/self-supervised-speech-recognition/stt.py", line 359, in transcribe generator = build_generator(args) File "/home/speechlab/self-supervised-speech-recognition/stt.py", line 347, in build_generator return W2lKenLMDecoder(args, task.target_dictionary) File "/home/speechlab/self-supervised-speech-recognition/libs/fairseq/examples/speech_recognition/w2l_decoder.py", line 133, in init super().init(args, tgt_dict) File "/home/speechlab/self-supervised-speech-recognition/libs/fairseq/examples/speech_recognition/w2l_decoder.py", line 56, in init self.criterion_type = CriterionType.CTC NameError: name 'CriterionType' is not defined

COMMENT As mentioned in https://github.com/flashlight/flashlight/issues/416#issuecomment-761728139; flashlight is installed, while all the modification mentioned for binding are not similar to the script I received after installation of fairseq

  1. Using checkpoint_utils

CODE import torch import fairseq cp_path = '/home/speechlab/self-supervised-speech-recognition/outputs/2022-02-27/11-29-48/checkpoints/checkpoint_last.pt' model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path]) model = model[0] model.eval() wav_input_16khz = torch.randn(1,10000) z = model.feature_extractor(wav_input_16khz) c = model.feature_aggregator(z)

ERROR AttributeError Traceback (most recent call last)

in 8 9 wav_input_16khz = torch.randn(1,10000) ---> 10 z = model.feature_extractor(wav_input_16khz) 11 c = model.feature_aggregator(z) ~/anaconda3/envs/shreya_ssl/lib/python3.6/site-packages/torch/nn/modules/module.py in __getattr__(self, name) 1176 return modules[name] 1177 raise AttributeError("'{}' object has no attribute '{}'".format( -> 1178 type(self).__name__, name)) 1179 1180 def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None: AttributeError: 'Wav2VecCtc' object has no attribute 'feature_extractor' **COMMENT** Can't figure out how to solve this error. The installation are done properly. The above script was executed inside main fairseq directory.
  1. infer.py

CODE $subset=dev_other python libs/fairseq/examples/speech_recognition/infer.py /home/speechlab/self-supervised-speech-recognition/manifest --task speech_to_text \ --nbest 1 --path outputs/2022-02-27/11-29-48/checkpoints/checkpoint_last.pt --results-path baseline_trial/test_result/out_sclite --w2l-decoder kenlm \ --lm-model libs/kenlm/build/bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --max-tokens 4000000 \ --post-process letter

COMMENT Refereed various resources, but not able to figure out exact +task. What task should be defined for evaluation from wav2vec finetuned check_points +the 2nd argument (here /home/speechlab/self-supervised-speech-recognition/manifest ) Which needs to be raw dataset. Can't understand exactly in which format data should be prepared. The error showed the need test.json file but didnt got any information what this file should consist of. +For same what should be the post-process

benam2 commented 2 years ago

I am facing this error in the training phase! Any idea how to fix it please?

- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[Featurizer for <class 's3prl.upstream.wav2vec2_hug.expert.UpstreamExpert'>] - The input upstream is only for initialization and not saved in this nn.Module
[Featurizer for <class 's3prl.upstream.wav2vec2_hug.expert.UpstreamExpert'>] - Take a list of 13 features and weighted sum them.
ASR dataset train: 100%| [00:11<00:00, 2394.55it/s]
/home/ai/fairseq/examples/speech_recognition/w2l_decoder.py:42: UserWarning: wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings
  "wav2letter python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/wav2letter/wiki/Python-bindings"
Traceback (most recent call last):
  File "run_downstream.py", line 203, in <module>
    main()
  File "run_downstream.py", line 198, in main
    runner = Runner(args, config)
  File "/home/ai/conda_envs/env/lib/python3.7/site-packages/s3prl/downstream/runner.py", line 47, in __init__
    self.downstream = self._get_downstream()
  File "/home/ai/conda_envs/env/lib/python3.7/site-packages/s3prl/downstream/runner.py", line 116, in _get_downstream
    **vars(self.args)
  File "/home/ai/conda_envs/sgoudarzvand_base/lib/python3.7/site-packages/s3prl/downstream/asr/expert.py", line 100, in __init__
    self.decoder = get_decoder(decoder_args, self.train_dataset.dictionary)
  File "/home/ai/conda_envs/env/lib/python3.7/site-packages/s3prl/downstream/asr/expert.py", line 37, in get_decoder
    return W2lKenLMDecoder(decoder_args, dictionary)
  File "/home/ai/fairseq/examples/speech_recognition/w2l_decoder.py", line 133, in __init__
    super().__init__(args, tgt_dict)
  File "/home/ai/fairseq/examples/speech_recognition/w2l_decoder.py", line 56, in __init__
    self.criterion_type = CriterionType.CTC
NameError: name 'CriterionType' is not defined
srksaurabh1 commented 1 year ago

Bump