Closed apurva1to3 closed 2 years ago
after training for few epoch I am getting ?? in the predicted transcription... Are you using fp32 for training? You may need to reduce the lr to avoid such divergences.
... I am trying to load checkpoint for inference... It is strongly recommended to use the nemo files, not the PTL checkpoints if nemo files are available.
I am getting size mismatch for preprocessor error... It looks to be a mismatch between the config you used for creating that model and the one you used for loading the model. More specifically the preprocessor sections looks to be different. Would you please share the commands you used to load the model?
``
@VahidooX I am using text_to_speech_bpe.py file for loading the model and do transfer learning I have also attached the config file with it. In the checkpoints I only see .ckpt files I don't see any .nemo files is there any way to save .nemo file for checkpoints in v.1.4.0 as I would like to check inference in-between the training. Is it recommended to use fp16 to avoid divergence?? I have checked the config for preprocessor, I have put all the values according to the config file provided by NeMo. I get size mismatch error when I use model = nemo_asr.models.EncDecCTCModelBPE.load_from_checkpoint(<path to checkpoint>)
Is there something else that I need to check. Please let me know if I am missing something I am not able to resolve the size mismatch error. will I be able to do the inference only after the whole training is completed?
`import pytorch_lightning as pl from omegaconf import OmegaConf import copy
from nemo.collections.asr.models.ctc_bpe_models import EncDecCTCModelBPE from nemo.core.config import hydra_runner from nemo.utils import logging from nemo.utils.exp_manager import exp_manager
@hydra_runner(config_path="config/", config_name="conformer_bpe_128_medium.yaml") def main(cfg): logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')
trainer = pl.Trainer(**cfg.trainer)
exp_manager(trainer, cfg.get("exp_manager", None))
asr_model = EncDecCTCModelBPE(cfg=cfg.model, trainer=trainer)
asr_model = EncDecCTCModelBPE.from_pretrained(model_name="stt_en_conformer_ctc_medium")
asr_model.change_vocabulary(new_tokenizer_dir=cfg.model.tokenizer.dir, new_tokenizer_type=cfg.model.tokenizer.type)
default_cfg = copy.deepcopy(asr_model.cfg)
new_preprocessor_config = copy.deepcopy(cfg.model.preprocessor)
new_preprocessor = asr_model.from_config_dict(new_preprocessor_config)
asr_model.preprocessor = new_preprocessor
cfg.model.preprocessor = new_preprocessor_config
asr_model.cfg = default_cfg
asr_model.setup_training_data(train_data_config=cfg.model.train_ds)
asr_model.setup_validation_data(val_data_config=cfg.model.validation_ds)
asr_model.setup_optimization(optim_config=cfg.model.optim)
new_spec_augment_config = copy.deepcopy(cfg.model.spec_augment)
asr_model.cfg.spec_augment = new_spec_augment_config
asr_model.spec_augment = asr_model.from_config_dict(asr_model.cfg.spec_augment)
# Initialize the weights of the model from another model, if provided via config
#asr_model.maybe_init_from_pretrained_checkpoint(cfg)
trainer.fit(asr_model)
asr_model.save_to('conformer_medium_128_hi.nemo')
if hasattr(cfg.model, 'test_ds') and cfg.model.test_ds.manifest_filepath is not None:
gpu = 1 if cfg.trainer.gpus != 0 else 0
test_trainer = pl.Trainer(
gpus=gpu,
precision=trainer.precision,
amp_level=trainer.accelerator_connector.amp_level,
amp_backend=cfg.trainer.get("amp_backend", "native"),
)
if asr_model.prepare_test(test_trainer):
test_trainer.test(asr_model)
if name == 'main': main() `
config file :
`name: "Conformer-CTC-BPE"
model: sample_rate: 16000 log_prediction: true ctc_reduction: 'mean_batch'
train_ds: manifest_filepath: /addon/all/shuf_train_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: true num_workers: 32 pin_memory: true use_start_end_token: false trim_silence: false max_duration: 16.7 min_duration: 0.1
validation_ds: manifest_filepath: /addon/all/validate_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: false num_workers: 32 pin_memory: true use_start_end_token: false
test_ds: manifest_filepath: /addon/all/test_manifest_final_single.json sample_rate: 16000 batch_size: 4 shuffle: false num_workers: 32 pin_memory: true use_start_end_token: false
tokenizer:
dir: tokenizer/tokenizer_spe_bpe_v128/
type: bpe
preprocessor: target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor sample_rate: 16000 normalize: "per_feature" window_size: 0.025 window_stride: 0.01 window: "hann" features: 80 n_fft: 512 log: true frame_splicing: 1 dither: 0.00001 pad_to: 0 pad_value: 0.0
spec_augment: target: nemo.collections.asr.modules.SpectrogramAugmentation freq_masks: 2
time_masks: 5
freq_width: 27
time_width: 0.05
encoder: target: nemo.collections.asr.modules.ConformerEncoder feat_in: 80 feat_out: -1 n_layers: 18 d_model: 256
subsampling: striding
subsampling_factor: 4
subsampling_conv_channels: -1
ff_expansion_factor: 4
self_attention_model: rel_pos
n_heads: 4
att_context_size: [-1, -1]
xscaling: true
untie_biases: true
pos_emb_max_len: 5000
conv_kernel_size: 31
dropout: 0.1
dropout_emb: 0.0
dropout_att: 0.1
decoder: target: nemo.collections.asr.modules.ConvASRDecoder feat_in: null num_classes: -1 vocabulary: []
optim: name: adamw lr: 5.0
betas: [0.9, 0.98]
weight_decay: 1e-3
sched:
name: NoamAnnealing
d_model: 256
warmup_steps: 10000
warmup_ratio: null
min_lr: 1e-6
max_steps: 2575500
trainer:
gpus: [2,3]
num_nodes: 1
max_epochs: 50
max_steps: null
val_check_interval: 1.0
accelerator: ddp
amp_backend: native
accumulate_grad_batches: 1
gradient_clip_val: 0.0
amp_level: O0
precision: 32
log_every_n_steps: 10
progress_bar_refresh_rate: 10
resume_from_checkpoint: null
num_sanity_val_steps: 0
check_val_every_n_epoch: 1
sync_batchnorm: true
checkpoint_callback: false
logger: false
exp_manager: exp_dir: 'experiments_hi/ctc_medium_128' name: "ASR-Model-Language-hi_2/" create_tensorboard_logger: true create_checkpoint_callback: true checkpoint_callback_params:
monitor: "val_wer"
mode: "min"
save_top_k: 10
always_save_nemo: true
resume_if_exists: false resume_ignore_no_checkpoint: false
create_wandb_logger: false wandb_logger_kwargs: name: null project: null`
For Conformers use FP32 only. Review the finetuning tutorials in ASR collection, most of the discussion applies to Conformers.
Hi @apurva1to3, did you have a solution please?
For anyone coming across this, here's how I solved my issue with size mismatch. I was finetuning from a pretrained model, however the config I used during training didn't match the pretrained model exactly (the size of the last layer in the encoder was different from the pretrained model.)
In order to restore from a .nemo
file, I had to extract the file:
tar -xf model.nemo
Then I had to manually edit the model_config.yaml
file (using your favorite editor) and adjust the incorrect parameter.
Finally, I re-tarred the extracted files:
tar -cf model.nemo model_config.yaml model_weights.ckpt ...
This way I could load the model properly.
You can use restore from to extract the config only, modify that, and then restore from that modified config to achieve the same effect. However, is this still an issue in nemo 1.7? Seems unlikely
You can use restore from to extract the config only, modify that, and then restore from that modified config to achieve the same effect.
I could not because restore_from()
also tries to load the weights
Set this to true to just get back the config - https://github.com/NVIDIA/NeMo/blob/b09d851ea9d3a924601f5f8c60d791cf50e1a768/nemo/core/classes/modelPT.py#L361
Then update the config and pass it here https://github.com/NVIDIA/NeMo/blob/b09d851ea9d3a924601f5f8c60d791cf50e1a768/nemo/core/classes/modelPT.py#L443
It's there in the docs too - https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/core/core.html#nemo.core.ModelPT.restore_from
Autocomplete in ide's should show these options as well, it is the preferred way of modifying config prior to loading the model in exactly cases like this
Good to know, should've read the docs first as always. Thanks!
No worries, I've updated the docs for the next release to explicitly have this syntax as a part of the "restore with modified config" workflow.
Hey can this overriding be replicated when we are trying to run using a shell script too? For example if i use :
python ${NEMO_GIT_FOLDER}/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py \
.
.
.
.
++init_from_pretrained_model=<hfrepo>
How can I modify config in this case? What I have identified here is, the pre-trained model's tokenizer had 32000 tokens, while my custom tokenizer got ~9k+ and this is causing a decoder shape mismatch error.
I am training a conformer model (size medium) on hindi dataset of about 501 hrs after training for few epoch I am getting ?? in the predicted transcription. When I am trying to load checkpoint for inference I am getting size mismatch for preprocessor error can someone help or any insights to help with this?