NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.09k stars 2.52k forks source link

problem of load ASR lstm trained model #4514

Closed sarasesar closed 2 years ago

sarasesar commented 2 years ago

hello. I have train a "LSTM-Transducer-BPE" model some days ago. and to save the model, I have used checkpoint. for this train, I have used the default config. after that, when i want to load this model by EncDecRNNTBPEModel method,

i got an error and i could not use this model:

RuntimeError: Error(s) in loading state_dict for EncDecRNNTBPEModel:
    Missing key(s) in state_dict: "decoder.prediction.dec_rnn.lstm.weight_hr_l0", "decoder.prediction.dec_rnn.lstm.weight_hr_l1". 
    size mismatch for decoder.prediction.dec_rnn.lstm.weight_ih_l0: copying a param with shape torch.Size([2560, 640]) from checkpoint, the shape in current model is torch.Size([8192, 640]).
    size mismatch for decoder.prediction.dec_rnn.lstm.weight_hh_l0: copying a param with shape torch.Size([2560, 640]) from checkpoint, the shape in current model is torch.Size([8192, 640]).
    size mismatch for decoder.prediction.dec_rnn.lstm.bias_ih_l0: copying a param with shape torch.Size([2560]) from checkpoint, the shape in current model is torch.Size([8192]).
    size mismatch for decoder.prediction.dec_rnn.lstm.bias_hh_l0: copying a param with shape torch.Size([2560]) from checkpoint, the shape in current model is torch.Size([8192]).
    size mismatch for decoder.prediction.dec_rnn.lstm.weight_ih_l1: copying a param with shape torch.Size([2560, 640]) from checkpoint, the shape in current model is torch.Size([8192, 640]).
    size mismatch for decoder.prediction.dec_rnn.lstm.weight_hh_l1: copying a param with shape torch.Size([2560, 640]) from checkpoint, the shape in current model is torch.Size([8192, 640]).
    size mismatch for decoder.prediction.dec_rnn.lstm.bias_ih_l1: copying a param with shape torch.Size([2560]) from checkpoint, the shape in current model is torch.Size([8192]).
    size mismatch for decoder.prediction.dec_rnn.lstm.bias_hh_l1: copying a param with shape torch.Size([2560]) from checkpoint, the shape in current model is torch.Size([8192]).

please help me. thank you.

titu1994 commented 2 years ago

Could you post the code to load the model (I assume Nemo model)? It seems that the checkpoint has weights for an encoder with hidden dim 2560 but is trying to load into a model with hidden dim 8192. As I recall lstm weights are 4*hidden_dim so seems the new model has a lot larger hidden dim.

Fyi @VahidooX

sarasesar commented 2 years ago

I tried to load model by nemo.collections.asr.EncDecRNNTBPEModel.restore_from() method. and also I tried to load the model by using several versions of Nemo such as 1.7.2, 1.8.2, and 1.9.0. But I could not load model.

I have train the model by this config :

  encoder:
    _target_: nemo.collections.asr.modules.RNNEncoder
    feat_in: ${model.preprocessor.features}=80
    n_layers: 8
    d_model: 2048
    proj_size: ${model.model_defaults.pred_hidden}=640
    rnn_type: "lstm" # it can be lstm, gru or rnn
    bidirectional: False

How can I fix it? thank you.

titu1994 commented 2 years ago

Config seems fine but this isn't the decoder config, can you post the full config ? And if possible the code/script being used to load the model.

sarasesar commented 2 years ago

thank you. sure, of course. decoder config was:

  decoder:
    _target_: nemo.collections.asr.modules.RNNTDecoder
    normalization_mode: null 
    random_state_sampling: false 
    blank_as_pad: true
    prednet:
      pred_hidden: ${model.model_defaults.pred_hidden}=640
      pred_rnn_layers: 2
      t_max: null
      dropout: 0.2
      rnn_hidden_size: 2048

  joint:
    _target_: nemo.collections.asr.modules.RNNTJoint
    log_softmax: null 
    preserve_memory: false  

    fuse_loss_wer: true
    fused_batch_size: 16

    jointnet:
      joint_hidden: ${model.model_defaults.joint_hidden}=640
      activation: "relu"
      dropout: 0.2

  decoding:
    strategy: "greedy" 
    # greedy strategy config
    greedy:
      max_symbols: 10

    # beam strategy config
    beam:
      beam_size: 50
      return_best_hypothesis: True
      score_norm: true
      tsd_max_sym_exp: 50 
      alsd_max_target_len: 2.0  

and my code for load the model is:

import nemo.collections.asr as nemo_asr
model = nemo_asr.models.EncDecRNNTBPEModel.restore_from('lstm_mymodel.nemo') 
titu1994 commented 2 years ago

Ok, somehow your config did not update properly and so restore is failing. Your rnn_hidden_size: 2048 shoud be 640, not 2048. (640*4 = 2560 params for the 4 gates of the Lstm cell).

You can follow the steps here - https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/core/core.html#restore-with-modified-config to change the value inside of your trained models config to 640, and then you can restore it.

sarasesar commented 2 years ago

oh, thank you. it loaded and worked.

but i used your repository config: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/lstm/lstm_transducer_bpe.yaml and if you check it, the rnn_hidden_size has been set to 2048. so please update the config of LSTM model.

titu1994 commented 2 years ago

@VahidooX can you change this line to read from the model_defaults (https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/lstm/lstm_transducer_bpe.yaml#L117).

@sarasesar did you train the model without modification to the config ? I just tried it locally for a few dozen steps and it saves a model with the correct 2048 hidden dimension.

sarasesar commented 2 years ago

I didn't modify config file. and I checked 1.10.0 tag of your repository and it was the same as the main branch and then, I used it.

titu1994 commented 2 years ago

Hmm that's a problem. We'll look into what's going on.

sarasesar commented 2 years ago

yes, thank you very much.

titu1994 commented 2 years ago

@VahidooX can you take a look at the LSTM encoder based RNNT code ? The decoder rakes a new arg but checkpoint loading is affected c

VahidooX commented 2 years ago

The default config is decoder.prednet.pred_hidden=640 and decoder.prednet.rnn_hidden_size=2048 which means LSTMs are going to have hidden size of 2048 with 640 projections between them to reduce the computation time. Looks like somehow your decoder.prednet.pred_hidden was set to 640 during the training. I will try to reproduce it.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 60 days with no activity.