Closed abarcovschi closed 12 months ago
This is an issue from tokenizer, are you sure your vocab size is 1024?
@nithinraok you are right, it is a problem with the vocab size of the tokenizer giving a mismatch.
The problem was occurring because when I try to create a 1024 sized tokenizer using the following command:
python process_asr_text_tokenizer.py --manifest="/workspace/datasets/cmu_hf/train_manifest.json" --data_root=tokenizers/cmu_hf926 --vocab_size=926 --tokenizer="spe" --spe_type="unigram" --log
I get a vocab size of 926. Does this mean my dataset is too small to form 1024 tokens according to the tokenizer?
I tried to overcome the problem by creating a tokenizer with --vocab-size=926
but I now get the following error when trying to finetune using the new 926 tokenizer:
Traceback (most recent call last):
File "/workspace/projects/nemo_asr/speech_to_text_rnnt_bpe_custom.py", line 114, in main
asr_model.maybe_init_from_pretrained_checkpoint(cfg)
File "/usr/local/lib/python3.10/dist-packages/lightning_utilities/core/rank_zero.py", line 32, in wrapped_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py", line 1219, in maybe_init_from_pretrained_checkpoint
self.load_state_dict(restored_model.state_dict(), strict=False)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2040, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for EncDecRNNTBPEModel:
size mismatch for decoder.prediction.embed.weight: copying a param with shape torch.Size([1025, 640]) from checkpoint, the shape in current model is torch.Size([927, 640]).
size mismatch for joint.joint_net.2.weight: copying a param with shape torch.Size([1025, 640]) from checkpoint, the shape in current model is torch.Size([927, 640]).
size mismatch for joint.joint_net.2.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([927]).
Does the model expect the tokenizer to have a vocabulary size of 1024 only?
If you're loading the weights of the decoder then yes it needs to have a vocab size of 1024. You can change the algorithm from unigram to "bpe" and it will generate more merged tokens.
Please follow the tutorial for ASR CTC fine-tuning in order to see all the steps necessary
@titu1994 thank you, this solved my problem!
Describe the bug
There is a size mismatch between EncDec and Joint and Decoder modules when trying to finetune a Large Conformer-Transducer model.
Steps/Code to reproduce bug
I installed NeMo inside a docker container using the following command:
Inside the docker container I downloaded the Conformer-Transducer Large model using the following command:
Then I tried to launch finetuning using the following command:
I get the following error:
My config file is as follows:
Additional context
Add any other context about the problem here. Example: A6000
I ran this exact command and successfully finetuned the model a few months ago last using the 23.03 image, but now I have no luck, coudl anyone please help me get this finetuning working?