Closed phvaha1 closed 4 days ago
@nguyenhoanganh2002 Hello, thanks for your work. I had formatted the training data as format you provided, and create new vocab.json file based on that data.
But when fine tuning xtts model from pretrained checkpoint (the default checkpoint as in document) with new vocab.json file, I got below error:
size mismatch for gpt.text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([7767, 1024])
I think it is because new vocabs size is 7767 but checkpoint has size 6681. Do you know how to fix this?
Could you provide complete error traceback? As mentioned in the repository, it's crucial to adjust the configuration file. Specifically, please check the line referenced at: https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages/blob/main/extend_vocab_config.py#L82
I'm currently getting the same error, I believe this could be due to the fact that the model that is being loaded isn't the correct one ? Unsure, will come back with any other clues I find ! 🔍
It seems like since the tokenizer is expanded but not the model, we end up with this error - @nguyenhoanganh2002 are you sure your implementation isn't missing code concerning the extension not just of the tokenizer but also the model that takes in the tokens ?
What's your tactic for extending the model ? Just initalizing weights with a gaussian normal for the embedding layers ?
It seems like since the tokenizer is expanded but not the model, we end up with this error - @nguyenhoanganh2002 are you sure your implementation isn't missing code concerning the extension not just of the tokenizer but also the model that takes in the tokens ?
What's your tactic for extending the model ? Just initalizing weights with a gaussian normal for the embedding layers ?
@phvaha1 Issue fixed for me, it wasn't a bug but was simply using the https://github.com/idiap/coqui-ai-TTS instead of the one in this git repo - if you look in the commits you'll notice they modify the model param size in this one.
2fb571637b20718647f9080b189c4a3f646e2d1a
Simply download the whole git / reference the TTS library custom made here.
Thank you for your feedback and investigation, @TugdualKerjan. I'll investigate and aim to fix it soon. I'll update this thread once I have more information or when the fix is implemented.
Hi @nguyenhoanganh2002 , I experienced the same thing as @phvaha1. What do you mean by the adjust_config function? Because it only adds the language, it has nothing to do with the 'size mismatch' issue with the number of tokens."
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Xtts:
size mismatch for gpt.text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([8174, 1024]).
size mismatch for gpt.text_head.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([8174, 1024]).
size mismatch for gpt.text_head.bias: copying a param with shape torch.Size([6681]) from checkpoint, the shape in current model is torch.Size([8174]).
When the checkpoint parameter is disabled xtts_checkpoint=None
, it works fine without any issues.
But your example code in https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages/blob/main/train_gpt_xtts.py#L163 define not None and try to fine tune original model with vi lang, and when following all the instructions in the README I'm still encountering the error mentioned above.
is there a solution?
thanks
@falhafizh make sure the TTS you're using is the modified one in this library and not the official python release. The TTS referenced here has code that expands the embedder to take into account the new tokens you added.
thanks @TugdualKerjan , I am indeed using all the code from this repo, but I'll try starting from scratch again and create a new environment to make sure I'm using the code from this repo.
@nguyenhoanganh2002
Hello, thanks for your work. I had formatted the training data as format you provided, and create new vocab.json file based on that data.
But when fine tuning xtts model from pretrained checkpoint (the default checkpoint as in document) with new vocab.json file, I got below error:
size mismatch for gpt.text_embedding.weight: copying a param with shape torch.Size([6681, 1024]) from checkpoint, the shape in current model is torch.Size([7767, 1024])
I think it is because new vocabs size is 7767 but checkpoint has size 6681. Do you know how to fix this?