Solution about add more speakers midway.

begeekmyfriend / tacotron2

Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2

BSD 3-Clause "New" or "Revised" License

81 stars 38 forks source link

Solution about add more speakers midway. #33

Closed leijue222 closed 3 years ago

leijue222 commented 3 years ago

Your base model already has four speakers.

What will you do if you want to add the 5th speaker?

Retrain the model from scratch.
Based on the 4-speakers model, use 5th speaker data to train.

Solution 1: The drawback is that you have to start all over again every time, which is too time-consuming. Solution 2: The number of speakers is related to the model's shape, so it will fail when reloading the base model.

Factors affecting model shape: the number of speakers and characters.

Could you give me some advice about it?

begeekmyfriend commented 3 years ago

Absolutely it depends on the speaker number which duplicates the symbol tables. This model does not support fine tuning when you want to add some new speakers. By the way I have tried other solutions such as changing the word embedding related to different speakers. However the current solution is much more steady and running well. So I have no idea about your requirements.

leijue222 commented 3 years ago

By the way I have tried other solutions such as changing the word embedding related to different speakers.

Since English phones are all uppercase letter + numbers and Chinese phones are all lowercase letters + numbers, the phones are distinguishable. Therefore, I'm trying to add the LJSPECCH dataset as a new speaker to train from scratch.

So I have no idea about your requirements.

The model can speak both English and Chinese words. And I want to try to record a few hours of my own voice data to train, to get the purpose of fine-tuning and adding speakers.

begeekmyfriend commented 3 years ago

Of course this model supports mixture of languages as long as it can distinguish different symbols that you design on your own. As for different tones from speakers it still a huge challenge to implement.

leijue222 commented 3 years ago

Thanks for your help.