Open GeleiaComPepino opened 6 months ago
You couldn't add the symbol by yourself because the step 3 using SynthesizerTrn (the generator of SoVITS) whose weights has been fixed by pretrained model. The weights' shape is related to the number of symbols.
You couldn't add the symbol by yourself because the step 3 using SoVITS which is fixed by pretrained model.
So how do I put the IPA in the code?
I think you should use a pretrained model for portuguese to replace, or use the existing symbols to express portuguese.
I think you should use a pretrained model for portuguese to replace, or use the existing symbols to express portuguese.
So why does English (cmudict) work with Chinese pretrain and Portuguese wouldn't work?
The symbols used to trained have contain ARPAbet (for American English), consonants and vowels (for Chinese) and others (for Japanese) So currently you can train and infer within these three language and they will be pronounced correctly.
I am not sure the phonemes of Portuguese can be tranferred to the ARPAbet? If can, you can write some code to transfer.
I tried to use ARPAbet in english with similar phoneme sound, but the accent is very different.
I was thinking about using wav2vec model, because there is no Portuguese Hubert model, only BERT.
I was thinking about using wav2vec model, because there is no Portuguese Hubert model, only BERT.
do you want to replace the hubert of Step 2 to extract feature and then input to the SynthesizerTrn? I think it maybe helpful for getting text in Portuguese but I am not sure if it is work for pronounce.
Language of training dataset is not significant for Hubert. And in general hubert training needs at least several thousands hours speech data. You can use open source pretrained Hubert trained with English or Chinese.
Language of training dataset is not significant for Hubert. And in general hubert training needs at least several thousands hours speech data. You can use open source pretrained Hubert trained with English or Chinese.
But I need to use IPA in Portuguese and pronounce it with a Brazilian accent, so I need a Hubert model that supports Portuguese, since there is no way to use Portuguese in this huberts, only English or Chinese.
I think you should use a pretrained model for portuguese to replace, or use the existing symbols to express portuguese.
Do you know what pretrain model i need to train to change the values of weight?
The symbols used to trained have contain ARPAbet (for American English), consonants and vowels (for Chinese) and others (for Japanese) So currently you can train and infer within these three language and they will be pronounced correctly.
I am following your advice, but I accidentally used 'v_without_tone' symbols and the symbols size is 355 > 322. The error is as follows: ... size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([355, 192]).RuntimeError: Error(s) in loading state_dict for SynthesizerTrn: size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([355, 192]). ... Now what is the fix? do I need to redo my phoneme set with 322 symbols or do you have a pretrained model trained with 355 symbols?
I tried to change the model pre-load, and it can run successfully, someone may have a try~~
@jiangyiqiao awesome, i will try it right away. Thanks!
I think you should use a pretrained model for portuguese to replace, or use the existing symbols to express portuguese.
Do you know what pretrain model i need to train to change the values of weight?
I found a multilingual HuBERT mHuBERT-147 in InterSpeech 2024 currently, maybe you can do some attempt.
I added a new dictionary IPA for portuguese in symbols.py, added portuguese.py to process text in IPA, but i'm receiving this error: "/usr/bin/python3" GPT_SoVITS/prepare_datasets/3-get-semantic.py "/usr/bin/python3" GPT_SoVITS/preparedatasets/3-get-semantic.py ['!', ',', '-', '.', '55', '?', 'AA', 'AA0', 'AA1', 'AA2', 'AE0', 'AE1', 'AE2', 'AH0', 'AH1', 'AH2', 'AO0', 'AO1', 'AO2', 'AW0', 'AW1', 'AW2', 'AY0', 'AY1', 'AY2', 'B', 'CH', 'D', 'DH', 'E1', 'E2', 'E3', 'E4', 'E5', 'EE', 'EH0', 'EH1', 'EH2', 'ER', 'ER0', 'ER1', 'ER2', 'EY0', 'EY1', 'EY2', 'En1', 'En2', 'En3', 'En4', 'En5', 'F', 'G', 'HH', 'I', 'IH', 'IH0', 'IH1', 'IH2', 'IY0', 'IY1', 'IY2', 'JH', 'K', 'L', 'M', 'N', 'NG', 'OO', 'OW0', 'OW1', 'OW2', 'OY0', 'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'SP', 'SP2', 'SP3', 'T', 'TH', 'U', 'UH0', 'UH1', 'UH2', 'UNK', 'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH', '', 'a', 'a1', 'a2', 'a3', 'a4', 'a5', 'ai1', 'ai2', 'ai3', 'ai4', 'ai5', 'an1', 'an2', 'an3', 'an4', 'an5', 'ang1', 'ang2', 'ang3', 'ang4', 'ang5', 'ao1', 'ao2', 'ao3', 'ao4', 'ao5', 'b', 'by', 'c', 'ch', 'cl', 'd', 'dy', 'e', 'e1', 'e2', 'e3', 'e4', 'e5', 'ei1', 'ei2', 'ei3', 'ei4', 'ei5', 'en1', 'en2', 'en3', 'en4', 'en5', 'eng1', 'eng2', 'eng3', 'eng4', 'eng5', 'er1', 'er2', 'er3', 'er4', 'er5', 'f', 'g', 'gy', 'h', 'hy', 'i', 'i01', 'i02', 'i03', 'i04', 'i05', 'i1', 'i2', 'i3', 'i4', 'i5', 'ia1', 'ia2', 'ia3', 'ia4', 'ia5', 'ian1', 'ian2', 'ian3', 'ian4', 'ian5', 'iang1', 'iang2', 'iang3', 'iang4', 'iang5', 'iao1', 'iao2', 'iao3', 'iao4', 'iao5', 'ie1', 'ie2', 'ie3', 'ie4', 'ie5', 'in1', 'in2', 'in3', 'in4', 'in5', 'ing1', 'ing2', 'ing3', 'ing4', 'ing5', 'iong1', 'iong2', 'iong3', 'iong4', 'iong5', 'ir1', 'ir2', 'ir3', 'ir4', 'ir5', 'iu1', 'iu2', 'iu3', 'iu4', 'iu5', 'j', 'k', 'ky', 'l', 'm', 'my', 'n', 'ny', 'o', 'o1', 'o2', 'o3', 'o4', 'o5', 'ong1', 'ong2', 'ong3', 'ong4', 'ong5', 'ou1', 'ou2', 'ou3', 'ou4', 'ou5', 'p', 'py', 'q', 'r', 'ry', 's', 'sh', 't', 'ts', 'u', 'u1', 'u2', 'u3', 'u4', 'u5', 'ua1', 'ua2', 'ua3', 'ua4', 'ua5', 'uai1', 'uai2', 'uai3', 'uai4', 'uai5', 'uan1', 'uan2', 'uan3', 'uan4', 'uan5', 'uang1', 'uang2', 'uang3', 'uang4', 'uang5', 'ui1', 'ui2', 'ui3', 'ui4', 'ui5', 'un1', 'un2', 'un3', 'un4', 'un5', 'uo1', 'uo2', 'uo3', 'uo4', 'uo5', 'v', 'v1', 'v2', 'v3', 'v4', 'v5', 'van1', 'van2', 'van3', 'van4', 'van5', 've1', 've2', 've3', 've4', 've5', 'vn1', 'vn2', 'vn3', 'vn4', 'vn5', 'w', 'x', 'y', 'z', 'zh', 'õ', 'ü', 'ɐ', 'ɔ', 'ɛ', 'ɡ', 'ɾ', 'ʒ', '̃', '…'] ['!', ',', '-', '.', '55', '?', 'AA', 'AA0', 'AA1', 'AA2', 'AE0', 'AE1', 'AE2', 'AH0', 'AH1', 'AH2', 'AO0', 'AO1', 'AO2', 'AW0', 'AW1', 'AW2', 'AY0', 'AY1', 'AY2', 'B', 'CH', 'D', 'DH', 'E1', 'E2', 'E3', 'E4', 'E5', 'EE', 'EH0', 'EH1', 'EH2', 'ER', 'ER0', 'ER1', 'ER2', 'EY0', 'EY1', 'EY2', 'En1', 'En2', 'En3', 'En4', 'En5', 'F', 'G', 'HH', 'I', 'IH', 'IH0', 'IH1', 'IH2', 'IY0', 'IY1', 'IY2', 'JH', 'K', 'L', 'M', 'N', 'NG', 'OO', 'OW0', 'OW1', 'OW2', 'OY0', 'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'SP', 'SP2', 'SP3', 'T', 'TH', 'U', 'UH0', 'UH1', 'UH2', 'UNK', 'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH', '_', 'a', 'a1', 'a2', 'a3', 'a4', 'a5', 'ai1', 'ai2', 'ai3', 'ai4', 'ai5', 'an1', 'an2', 'an3', 'an4', 'an5', 'ang1', 'ang2', 'ang3', 'ang4', 'ang5', 'ao1', 'ao2', 'ao3', 'ao4', 'ao5', 'b', 'by', 'c', 'ch', 'cl', 'd', 'dy', 'e', 'e1', 'e2', 'e3', 'e4', 'e5', 'ei1', 'ei2', 'ei3', 'ei4', 'ei5', 'en1', 'en2', 'en3', 'en4', 'en5', 'eng1', 'eng2', 'eng3', 'eng4', 'eng5', 'er1', 'er2', 'er3', 'er4', 'er5', 'f', 'g', 'gy', 'h', 'hy', 'i', 'i01', 'i02', 'i03', 'i04', 'i05', 'i1', 'i2', 'i3', 'i4', 'i5', 'ia1', 'ia2', 'ia3', 'ia4', 'ia5', 'ian1', 'ian2', 'ian3', 'ian4', 'ian5', 'iang1', 'iang2', 'iang3', 'iang4', 'iang5', 'iao1', 'iao2', 'iao3', 'iao4', 'iao5', 'ie1', 'ie2', 'ie3', 'ie4', 'ie5', 'in1', 'in2', 'in3', 'in4', 'in5', 'ing1', 'ing2', 'ing3', 'ing4', 'ing5', 'iong1', 'iong2', 'iong3', 'iong4', 'iong5', 'ir1', 'ir2', 'ir3', 'ir4', 'ir5', 'iu1', 'iu2', 'iu3', 'iu4', 'iu5', 'j', 'k', 'ky', 'l', 'm', 'my', 'n', 'ny', 'o', 'o1', 'o2', 'o3', 'o4', 'o5', 'ong1', 'ong2', 'ong3', 'ong4', 'ong5', 'ou1', 'ou2', 'ou3', 'ou4', 'ou5', 'p', 'py', 'q', 'r', 'ry', 's', 'sh', 't', 'ts', 'u', 'u1', 'u2', 'u3', 'u4', 'u5', 'ua1', 'ua2', 'ua3', 'ua4', 'ua5', 'uai1', 'uai2', 'uai3', 'uai4', 'uai5', 'uan1', 'uan2', 'uan3', 'uan4', 'uan5', 'uang1', 'uang2', 'uang3', 'uang4', 'uang5', 'ui1', 'ui2', 'ui3', 'ui4', 'ui5', 'un1', 'un2', 'un3', 'un4', 'un5', 'uo1', 'uo2', 'uo3', 'uo4', 'uo5', 'v', 'v1', 'v2', 'v3', 'v4', 'v5', 'van1', 'van2', 'van3', 'van4', 'van5', 've1', 've2', 've3', 've4', 've5', 'vn1', 'vn2', 'vn3', 'vn4', 'vn5', 'w', 'x', 'y', 'z', 'zh', 'õ', 'ü', 'ɐ', 'ɔ', 'ɛ', 'ɡ', 'ɾ', 'ʒ', '̃', '…'] /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Traceback (most recent call last): File "/content/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 62, in
vq_model.load_state_dict(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
Traceback (most recent call last):
File "/content/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 62, in
vq_model.load_state_dict(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SynthesizerTrn:
size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([331, 192]). raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SynthesizerTrn: size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([331, 192]).
I'm using the chinese hubert model and the chinese bert model, i believe that was a model error, anyone can help?