Closed triumph9989 closed 1 year ago
asr_decoder_ts
instance must be created to perform ASR with diarization. It should not be None
.
For this, you should provide a NeMo based ASR model to use ASR with diarization.
It currently supports quartzNet, citrinet, and conformerCTC based ASR models.
providing "???" to asr.model_path
will throw out a bug because no ASR model is provided.
Please train or download NeMo based ASR model from NGC and provide the .nemo
file path to asr.model_path
.
@tango4j
Thanks for your concern about my issue.
I added my Conformer-CTC (EncDecCTCModel) directly in diar_infer_meeting.yaml
.
asr:
model_path: /home/face/NeMo/examples/asr/exp/Conformer-CTC-Char-Aishell-100ep-lr0.9/2022-12-01_09-43-05/checkpoints/Conformer-CTC-Char-Aishell-100ep-lr0.9.nemo
bug
[NeMo I 2022-12-07 14:03:52 speaker_utils:92] Number of files to diarize: 1
[NeMo E 2022-12-07 14:04:05 common:505] Model instantiation failed!
Target class: nemo.collections.asr.models.ctc_models.EncDecCTCModel
Error(s): `cfg` must have `tokenizer` config to create a tokenizer !
Traceback (most recent call last):
File "/home/face/NeMo/nemo/core/classes/common.py", line 484, in from_config_dict
instance = imported_cls(cfg=config, trainer=trainer)
File "/home/face/NeMo/nemo/collections/asr/models/ctc_bpe_models.py", line 44, in __init__
raise ValueError("`cfg` must have `tokenizer` config to create a tokenizer !")
ValueError: `cfg` must have `tokenizer` config to create a tokenizer !
Error executing job with overrides: []
Traceback (most recent call last):
File "offline_diar_with_asr_infer.py", line 54, in main
asr_model = asr_decoder_ts.set_asr_model()
File "/home/face/NeMo/nemo/collections/asr/parts/utils/decoder_timestamps_utils.py", line 358, in set_asr_model
asr_model = self.encdec_class.restore_from(restore_path=self.ASR_model_name)
File "/home/face/NeMo/nemo/core/classes/modelPT.py", line 316, in restore_from
instance = cls._save_restore_connector.restore_from(
File "/home/face/NeMo/nemo/core/connectors/save_restore_connector.py", line 235, in restore_from
loaded_params = self.load_config_and_state_dict(
File "/home/face/NeMo/nemo/core/connectors/save_restore_connector.py", line 158, in load_config_and_state_dict
instance = calling_cls.from_config_dict(config=conf, trainer=trainer)
File "/home/face/NeMo/nemo/core/classes/common.py", line 506, in from_config_dict
raise e
File "/home/face/NeMo/nemo/core/classes/common.py", line 498, in from_config_dict
instance = cls(cfg=config, trainer=trainer)
File "/home/face/NeMo/nemo/collections/asr/models/ctc_bpe_models.py", line 44, in __init__
raise ValueError("`cfg` must have `tokenizer` config to create a tokenizer !")
ValueError: `cfg` must have `tokenizer` config to create a tokenizer !
Also, I tried the pre-trained model stt_en_conformer_ctc_large
from NGC and another model /home/face/NeMo/examples/asr/exp/QuartzNet15x5-lr2.2-ep100.nemo
I trained before, and they seemed no bug.
@triumph9989 your model does not have a tokenizer. The decoder_timestamp_utils cannot be functioning with tokenizer in your model. Add the tokenizer to your model and check if it works.
Make sure to have the same class structure with the NeMo ASR model class. Otherwise, your ASR model won't be working with NeMo modules.
@tango4j Thank you for being so helpful. I'm training with a tokenizer now. But I refer to NeMo's explanation, using the tokenizer represents using sub-word encoding. So, decoder_timestamp_utils can't use a character-based encoding ASR?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
@triumph9989 Since decoder_timestamp_utils can use QuartzNet, I suppose that char based tokenizer can be used. However, be careful when you replace NeMo ASR model class since the diarization + ASR framework assumes that we are using NeMo ASR class.
Hi, I want to join my ASR model into
offline_diar_with_asr_infer.py
and useasr_based_vad
. But the following error is appearance.If
cfg.diarizer.asr.model_path=???
it will show another error.ValueError: `cfg` must have `tokenizer` config to create a tokenizer !
I think that is about the inconsistency between my ASR class and this task, but I'm not very sure. Is something that I have not noticed, please?
Steps/Code to reproduce bug
offline_diar_with_asr_infer.py
diar_infer_meeting.yaml
Environment overview
Environment details
Additional context
GPU: GeForce RTX 2080 Ti