Open nivibilla opened 1 year ago
This model is used to solve Chinese prosodic problems. So, what is your goal?
I've tried to finetune VITS with a dataset. While the voice was good. The tone and prosody was not good. So I came across this model where it implements some features from Natural Speech to improve that. So I wanted to know if it's possible to finetune this model on English dataset.
@MaxMax2016 I understand this repo was made to solve chine prosodic problems but can it be used to English data?
https://github.com/PlayVoice/vits_chinese/blob/master/train.py#L267
may this is useful for you: https://github.com/heatz123/naturalspeech
@nivibilla i think an english bert may be usefull
The characters used by bert have multiple pronunciation units, two in Chinese and n in English.The bert vector of each character is copied and expanded according to the number of pronunciation units corresponding to the character.
@nivibilla i think an english bert may be usefull
The characters used by bert have multiple pronunciation units, two in Chinese and n in English.The bert vector of each character is copied and expanded according to the number of pronunciation units corresponding to the character.
@MaxMax2016 ,大佬,请教一个问题。我在尝试增加英文支持的时候遇到char_embeds.size(0) 和 len(length)不相等的问题,token和phone item对应不上,请问这种情况怎么解决好?
text: [PAD]unfriendly[PAD] phone_items: ['sil', 'AH0', 'N', 'F', 'R', 'EH1', 'N', 'D', 'L', 'IY0', 'sil'] tokens: ['[PAD]', 'u', '##n', '##fr', '##ien', '##d', '##ly', '[PAD]']
char_embeds.size(0): 8
len(length): 11
Traceback (most recent call last):
File "vits_en_prepare2.py", line 125, in
先tokenize再转phoneme的话,phone的音就变了,比如unfriendly的第一个toke是u,u的phoneme就变成 'Y UW1' 了
Hey,
Is it possible to adapt this model to train on English dataset?
Or should I just use normal VITS?