Closed leijue222 closed 3 years ago
Absolutely it depends on the speaker number which duplicates the symbol tables. This model does not support fine tuning when you want to add some new speakers. By the way I have tried other solutions such as changing the word embedding related to different speakers. However the current solution is much more steady and running well. So I have no idea about your requirements.
By the way I have tried other solutions such as changing the word embedding related to different speakers.
Since English phones are all uppercase letter + numbers
and Chinese phones are all lowercase letters + numbers
, the phones are distinguishable. Therefore, I'm trying to add the LJSPECCH dataset as a new speaker to train from scratch.
So I have no idea about your requirements.
The model can speak both English and Chinese words. And I want to try to record a few hours of my own voice data to train, to get the purpose of fine-tuning and adding speakers.
Of course this model supports mixture of languages as long as it can distinguish different symbols that you design on your own. As for different tones from speakers it still a huge challenge to implement.
Thanks for your help.
Your base model already has four speakers.
What will you do if you want to add the 5th speaker?
Solution 1: The drawback is that you have to start all over again every time, which is too time-consuming. Solution 2: The number of speakers is related to the model's shape, so it will fail when reloading the base model.
Factors affecting model shape: the number of speakers and characters.
Could you give me some advice about it?