Closed sarasesar closed 2 years ago
Could you post the code to load the model (I assume Nemo model)? It seems that the checkpoint has weights for an encoder with hidden dim 2560 but is trying to load into a model with hidden dim 8192. As I recall lstm weights are 4*hidden_dim so seems the new model has a lot larger hidden dim.
Fyi @VahidooX
I tried to load model by nemo.collections.asr.EncDecRNNTBPEModel.restore_from()
method. and also I tried to load the model by using several versions of Nemo such as 1.7.2, 1.8.2, and 1.9.0. But I could not load model.
I have train the model by this config :
encoder:
_target_: nemo.collections.asr.modules.RNNEncoder
feat_in: ${model.preprocessor.features}=80
n_layers: 8
d_model: 2048
proj_size: ${model.model_defaults.pred_hidden}=640
rnn_type: "lstm" # it can be lstm, gru or rnn
bidirectional: False
How can I fix it? thank you.
Config seems fine but this isn't the decoder config, can you post the full config ? And if possible the code/script being used to load the model.
thank you. sure, of course. decoder config was:
decoder:
_target_: nemo.collections.asr.modules.RNNTDecoder
normalization_mode: null
random_state_sampling: false
blank_as_pad: true
prednet:
pred_hidden: ${model.model_defaults.pred_hidden}=640
pred_rnn_layers: 2
t_max: null
dropout: 0.2
rnn_hidden_size: 2048
joint:
_target_: nemo.collections.asr.modules.RNNTJoint
log_softmax: null
preserve_memory: false
fuse_loss_wer: true
fused_batch_size: 16
jointnet:
joint_hidden: ${model.model_defaults.joint_hidden}=640
activation: "relu"
dropout: 0.2
decoding:
strategy: "greedy"
# greedy strategy config
greedy:
max_symbols: 10
# beam strategy config
beam:
beam_size: 50
return_best_hypothesis: True
score_norm: true
tsd_max_sym_exp: 50
alsd_max_target_len: 2.0
and my code for load the model is:
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.EncDecRNNTBPEModel.restore_from('lstm_mymodel.nemo')
Ok, somehow your config did not update properly and so restore is failing. Your rnn_hidden_size: 2048 shoud be 640, not 2048. (640*4 = 2560 params for the 4 gates of the Lstm cell).
You can follow the steps here - https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/core/core.html#restore-with-modified-config to change the value inside of your trained models config to 640, and then you can restore it.
oh, thank you. it loaded and worked.
but i used your repository config:
https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/lstm/lstm_transducer_bpe.yaml
and if you check it, the rnn_hidden_size has been set to 2048.
so please update the config of LSTM model.
@VahidooX can you change this line to read from the model_defaults (https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/lstm/lstm_transducer_bpe.yaml#L117).
@sarasesar did you train the model without modification to the config ? I just tried it locally for a few dozen steps and it saves a model with the correct 2048 hidden dimension.
I didn't modify config file. and I checked 1.10.0 tag of your repository and it was the same as the main branch and then, I used it.
Hmm that's a problem. We'll look into what's going on.
yes, thank you very much.
@VahidooX can you take a look at the LSTM encoder based RNNT code ? The decoder rakes a new arg but checkpoint loading is affected c
The default config is decoder.prednet.pred_hidden=640 and decoder.prednet.rnn_hidden_size=2048 which means LSTMs are going to have hidden size of 2048 with 640 projections between them to reduce the computation time. Looks like somehow your decoder.prednet.pred_hidden was set to 640 during the training. I will try to reproduce it.
This issue is stale because it has been open for 60 days with no activity.
hello. I have train a "LSTM-Transducer-BPE" model some days ago. and to save the model, I have used checkpoint. for this train, I have used the default config. after that, when i want to load this model by
EncDecRNNTBPEModel
method,i got an error and i could not use this model:
please help me. thank you.