NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.11k stars 2.52k forks source link

Key 'num_classes' is not in struct #8608

Closed aidos-aiforiatech closed 8 months ago

aidos-aiforiatech commented 8 months ago

Describe the bug

I am trying to run a train process of fastconformer_hybrid_transducer_ctc_bpe_streaming, but obtaining an error

omegaconf.errors.ConfigKeyError: Key 'num_classes' is not in struct
    full_key: decoder.num_classes
    object_type=dict

Steps/Code to reproduce bug

OC_CAUSE=1 HYDRA_FULL_ERROR=1 python ${NEMO_ROOT}/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py --config-path=${PWD}/conf/conformer/ --config-name=fastconformer_hybrid_transducer_ctc_bpe_streaming_kazakh exp_manager.name="KAZ_MCV_ISSAI_V1" exp_manager.exp_dir=results/ ++model.encoder.conv_norm_type=layer_norm

Expected behavior

to run a train

Environment overview (please complete the following information)

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

Additional context

The part of the config where num_classes is:

  # The section which would contain the decoder and decoding configs of the auxiliary CTC decoder
  aux_ctc:
    ctc_loss_weight: 0.3 # the weight used to combine the CTC loss with the RNNT loss
    use_cer: false
    ctc_reduction: 'mean_batch'
    decoder:
      _target_: nemo.collections.asr.modules.ConvASRDecoder
      feat_in: null
      num_classes: -1
      vocabulary: []
    decoding:
      strategy: "greedy"
titu1994 commented 8 months ago

Your config seems to be for a hybrid model, not CTC. You'll need to use this instead - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py