NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.08k stars 2.52k forks source link

nemov1.4.0 : RuntimeError: Error(s) in loading state_dict for EncDecCTCModelBPE: size mismatch for preprocessor.featurizer.window: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([400]). #3167

Closed apurva1to3 closed 2 years ago

apurva1to3 commented 3 years ago

I am training a conformer model (size medium) on hindi dataset of about 501 hrs after training for few epoch I am getting ?? in the predicted transcription. When I am trying to load checkpoint for inference I am getting size mismatch for preprocessor error can someone help or any insights to help with this?

Screen Shot 2021-11-11 at 6 36 03 PM
VahidooX commented 3 years ago

after training for few epoch I am getting ?? in the predicted transcription... Are you using fp32 for training? You may need to reduce the lr to avoid such divergences.

... I am trying to load checkpoint for inference... It is strongly recommended to use the nemo files, not the PTL checkpoints if nemo files are available.

I am getting size mismatch for preprocessor error... It looks to be a mismatch between the config you used for creating that model and the one you used for loading the model. More specifically the preprocessor sections looks to be different. Would you please share the commands you used to load the model?

apurva1to3 commented 3 years ago

``

apurva1to3 commented 3 years ago

@VahidooX I am using text_to_speech_bpe.py file for loading the model and do transfer learning I have also attached the config file with it. In the checkpoints I only see .ckpt files I don't see any .nemo files is there any way to save .nemo file for checkpoints in v.1.4.0 as I would like to check inference in-between the training. Is it recommended to use fp16 to avoid divergence?? I have checked the config for preprocessor, I have put all the values according to the config file provided by NeMo. I get size mismatch error when I use model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(<path to checkpoint>) Is there something else that I need to check. Please let me know if I am missing something I am not able to resolve the size mismatch error. will I be able to do the inference only after the whole training is completed?

`import pytorch_lightning as pl from omegaconf import OmegaConf import copy

from nemo.collections.asr.models.ctc_bpe_models import EncDecCTCModelBPE from nemo.core.config import hydra_runner from nemo.utils import logging from nemo.utils.exp_manager import exp_manager

@hydra_runner(config_path="config/", config_name="conformer_bpe_128_medium.yaml") def main(cfg): logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')

trainer = pl.Trainer(**cfg.trainer)
exp_manager(trainer, cfg.get("exp_manager", None))
asr_model = EncDecCTCModelBPE(cfg=cfg.model, trainer=trainer)
asr_model = EncDecCTCModelBPE.from_pretrained(model_name="stt_en_conformer_ctc_medium")
asr_model.change_vocabulary(new_tokenizer_dir=cfg.model.tokenizer.dir, new_tokenizer_type=cfg.model.tokenizer.type)
default_cfg = copy.deepcopy(asr_model.cfg)
new_preprocessor_config = copy.deepcopy(cfg.model.preprocessor)
new_preprocessor = asr_model.from_config_dict(new_preprocessor_config)
asr_model.preprocessor = new_preprocessor
cfg.model.preprocessor = new_preprocessor_config
asr_model.cfg = default_cfg
asr_model.setup_training_data(train_data_config=cfg.model.train_ds)
asr_model.setup_validation_data(val_data_config=cfg.model.validation_ds)
asr_model.setup_optimization(optim_config=cfg.model.optim)
new_spec_augment_config = copy.deepcopy(cfg.model.spec_augment)
asr_model.cfg.spec_augment = new_spec_augment_config
asr_model.spec_augment = asr_model.from_config_dict(asr_model.cfg.spec_augment)
# Initialize the weights of the model from another model, if provided via config
#asr_model.maybe_init_from_pretrained_checkpoint(cfg)

trainer.fit(asr_model)
asr_model.save_to('conformer_medium_128_hi.nemo')

if hasattr(cfg.model, 'test_ds') and cfg.model.test_ds.manifest_filepath is not None:
    gpu = 1 if cfg.trainer.gpus != 0 else 0
    test_trainer = pl.Trainer(
        gpus=gpu,
        precision=trainer.precision,
        amp_level=trainer.accelerator_connector.amp_level,
        amp_backend=cfg.trainer.get("amp_backend", "native"),
    )
    if asr_model.prepare_test(test_trainer):
        test_trainer.test(asr_model)

if name == 'main': main() `

config file :

`name: "Conformer-CTC-BPE"

model: sample_rate: 16000 log_prediction: true ctc_reduction: 'mean_batch'

train_ds: manifest_filepath: /addon/all/shuf_train_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: true num_workers: 32 pin_memory: true use_start_end_token: false trim_silence: false max_duration: 16.7 min_duration: 0.1

validation_ds: manifest_filepath: /addon/all/validate_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: false num_workers: 32 pin_memory: true use_start_end_token: false

test_ds: manifest_filepath: /addon/all/test_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: false num_workers: 32 pin_memory: true use_start_end_token: false

tokenizer: dir: tokenizer/tokenizer_spe_bpe_v128/
type: bpe

preprocessor: target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor sample_rate: 16000 normalize: "per_feature" window_size: 0.025 window_stride: 0.01 window: "hann" features: 80 n_fft: 512 log: true frame_splicing: 1 dither: 0.00001 pad_to: 0 pad_value: 0.0

spec_augment: target: nemo.collections.asr.modules.SpectrogramAugmentation freq_masks: 2

time_masks: 5 
freq_width: 27
time_width: 0.05

encoder: target: nemo.collections.asr.modules.ConformerEncoder feat_in: 80 feat_out: -1 n_layers: 18 d_model: 256

subsampling: striding 
subsampling_factor: 4 
subsampling_conv_channels: -1 

ff_expansion_factor: 4

self_attention_model: rel_pos 
n_heads: 4

att_context_size: [-1, -1] 
xscaling: true 
untie_biases: true 
pos_emb_max_len: 5000

conv_kernel_size: 31

dropout: 0.1 
dropout_emb: 0.0 
dropout_att: 0.1 

decoder: target: nemo.collections.asr.modules.ConvASRDecoder feat_in: null num_classes: -1 vocabulary: []

optim: name: adamw lr: 5.0

betas: [0.9, 0.98]

weight_decay: 1e-3

sched:
  name: NoamAnnealing
  d_model: 256

  warmup_steps: 10000
  warmup_ratio: null
  min_lr: 1e-6
  max_steps: 2575500

trainer: gpus: [2,3] num_nodes: 1 max_epochs: 50 max_steps: null val_check_interval: 1.0 accelerator: ddp amp_backend: native accumulate_grad_batches: 1 gradient_clip_val: 0.0 amp_level: O0 precision: 32 log_every_n_steps: 10
progress_bar_refresh_rate: 10 resume_from_checkpoint: null num_sanity_val_steps: 0 check_val_every_n_epoch: 1 sync_batchnorm: true checkpoint_callback: false
logger: false

exp_manager: exp_dir: 'experiments_hi/ctc_medium_128' name: "ASR-Model-Language-hi_2/" create_tensorboard_logger: true create_checkpoint_callback: true checkpoint_callback_params:

monitor: "val_wer"
mode: "min"
save_top_k: 10
always_save_nemo: true

resume_if_exists: false resume_ignore_no_checkpoint: false

create_wandb_logger: false wandb_logger_kwargs: name: null project: null`

titu1994 commented 2 years ago

For Conformers use FP32 only. Review the finetuning tutorials in ASR collection, most of the discussion applies to Conformers.

BenoitWang commented 2 years ago

Hi @apurva1to3, did you have a solution please?

piraka9011 commented 2 years ago

For anyone coming across this, here's how I solved my issue with size mismatch. I was finetuning from a pretrained model, however the config I used during training didn't match the pretrained model exactly (the size of the last layer in the encoder was different from the pretrained model.)

In order to restore from a .nemo file, I had to extract the file:

tar -xf model.nemo

Then I had to manually edit the model_config.yaml file (using your favorite editor) and adjust the incorrect parameter. Finally, I re-tarred the extracted files:

tar -cf model.nemo model_config.yaml model_weights.ckpt ...

This way I could load the model properly.

titu1994 commented 2 years ago

You can use restore from to extract the config only, modify that, and then restore from that modified config to achieve the same effect. However, is this still an issue in nemo 1.7? Seems unlikely

piraka9011 commented 2 years ago

You can use restore from to extract the config only, modify that, and then restore from that modified config to achieve the same effect.

I could not because restore_from() also tries to load the weights

titu1994 commented 2 years ago

Set this to true to just get back the config - https://github.com/NVIDIA/NeMo/blob/b09d851ea9d3a924601f5f8c60d791cf50e1a768/nemo/core/classes/modelPT.py#L361

Then update the config and pass it here https://github.com/NVIDIA/NeMo/blob/b09d851ea9d3a924601f5f8c60d791cf50e1a768/nemo/core/classes/modelPT.py#L443

titu1994 commented 2 years ago

It's there in the docs too - https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/core/core.html#nemo.core.ModelPT.restore_from

Autocomplete in ide's should show these options as well, it is the preferred way of modifying config prior to loading the model in exactly cases like this

piraka9011 commented 2 years ago

Good to know, should've read the docs first as always. Thanks!

titu1994 commented 2 years ago

No worries, I've updated the docs for the next release to explicitly have this syntax as a part of the "restore with modified config" workflow.

appledora commented 1 year ago

Hey can this overriding be replicated when we are trying to run using a shell script too? For example if i use :

python ${NEMO_GIT_FOLDER}/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py \ 
.
.
.
.
++init_from_pretrained_model=<hfrepo>

How can I modify config in this case? What I have identified here is, the pre-trained model's tokenizer had 32000 tokens, while my custom tokenizer got ~9k+ and this is causing a decoder shape mismatch error.