NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.83k stars 2.46k forks source link

Config settings for Building a custom decoder using Pretrained conformer CTC Hindi model as Encoder #5581

Closed manjuke closed 1 year ago

manjuke commented 1 year ago

Hi, I am trying to use pretrained conformer-ctc-medium Hindi model as encoder, and I would like to add conformer-transducer as decoder initialized with random weights.

I tried following things, but none of them is working:

  1. I have used conformer_transducer_bpe.yaml thinking that the decoder config parameters would be loaded from the config & "RNNTDecoder" would be added.
  2. I have tried initializing in the python code, but that is also not working: conf_ctc_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_hi_conformer_ctc_medium") conf_ctc_model.cfg.decoder.target='nemo.collections.asr.modules.RNNTDecoder'
  3. I have tried adding decoder in python code as RNN_decoder = dict( target = 'nemo.collections.asr.modules.RNNTDecoder', normalization_mode = 'null', random_state_sampling = 'false', blank_as_pad = 'true',

prednet = dict( pred_hidden = 640, pred_rnn_layers = 1, t_max= 'null', dropout= 0.2 ) ) params['model']['decoder'] = RNN_decoder

  1. In addition to RNN_decoder, tried initializing joint, decoding, loss, variational_noise config parameters. But did not help.
  2. Tried change_vocabulary before & after config parameters loading

But, In all cases decoder was being displayed as ConvASRDecoder, loss = CTCloss, encoder= ConformerEncoder, both during training & in hparams.yaml file.

Please suggest how to use pretrained conformer-ctc with a custom decoder. This decoder can be a conformer/transformer. or Can I also try using encoder decoder based "Quartznet" hindi pretrained model to accomplish the same.

Or Can I use pyCTCdecoder for decoding? Thanks

manjuke commented 1 year ago

Please suggest @jbalam-nv @titu1994

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.