NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Error when running eval_beamsearch_ngram.py #2526

Closed arunvenkatesan-nv closed 3 years ago

arunvenkatesan-nv commented 3 years ago

Hi, I started looking into LM with Nemo. I was able to create a kenlm model using train_kenlm.py script. Then I tried to run eval_beamsearch_ngram.py using a fine-tuned model we trained, kenlm model, and test manifest, but got error. Could you tell us what to do to fix this? By the way there is no log file created, and we didn’t modify those NVIDIA scripts. Thank you so much!

I followed: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html

python eval_beamsearch_ngram.py --nemo_model_file ../../models/model_2021-07-19_12-01-41_4GPU.nemo \ --input_manifest ../../manifests/test.jsonl \ --kenlm_model_file kenlm_model_2021-07-19_12-01-41 \ --acoustic_batch_size 32 \ --preds_output_folder output \ --decoding_mode beamsearch_ngram \ --beam_width 64 128 \ --beam_alpha 1.0 \ --beam_beta 1.0 0.5

Traceback (most recent call last): File "eval_beamsearch_ngram.py", line 345, in main() File "eval_beamsearch_ngram.py", line 217, in main asr_model = nemo_asr.models.EncDecCTCModelBPE.restore_from( File "/NeMo/nemo/core/classes/modelPT.py", line 479, in restore_from return cls._default_restore_from(restore_path, override_config_path, map_location, strict, return_config) File "/NeMo/nemo/core/classes/modelPT.py", line 430, in _default_restore_from instance = cls.from_config_dict(config=conf) File "/NeMo/nemo/core/classes/common.py", line 471, in from_config_dict instance = cls(cfg=config) File "/NeMo/nemo/collections/asr/models/ctc_bpe_models.py", line 98, in init raise ValueError("cfg must have tokenizer config to create a tokenizer !") ValueError: cfg must have tokenizer config to create a tokenizer !

titu1994 commented 3 years ago

What type of model is model_2021-07-19_12-01-41_4GPU.nemo? A QuartzNet or a Citrinet ? If it's a QuartzNet, then you need to use EncDecCTCModel.restore_from()

VahidooX commented 3 years ago

Created a PR to fix that. They may follow Som's suggestion until it is merged.

VahidooX commented 3 years ago

The PR is merged into main: https://github.com/NVIDIA/NeMo/pull/2530