ex3ndr / supervoice

VoiceBox neural network implementation
73 stars 6 forks source link

Text to unit training code #2

Open PranjalyaDS opened 4 months ago

PranjalyaDS commented 4 months ago

Hi, the work definitely looks promising, especially since no one is for some reason really trying to figure out VoiceBox. So, yeah, thanks for your work!

I see that you are currently working on the vocoder part of it, is the text to units part covered end-to-end in this repo, or is anything left to incorporate? Was just curious regarding that.

ex3ndr commented 4 months ago

I have opted to BigVSAN - i was really impressed by it's quality, i wasn't to spot any difference from synthesized and real audio on my datasets. I have published easy to use library and you can check the evaluation notebook that requires only pytorch.

https://github.com/ex3ndr/supervoice-vocoder

I am in the process of training, i had to restart from scratch because of replaced vocoder, but it was impressive already and vocoder was the problem for me, now it is not.

PranjalyaDS commented 4 months ago

Looks good!

rishikksh20 commented 4 months ago

@ex3ndr on how much hours of data you are training the new model? I am also planning to train on my own dataset if your training goes well.

ex3ndr commented 4 months ago

i am training on quite small dataset- libritts-r + vctk. They have only high quality voice, but i want to try to do some pre-training on much bigger one to cover many languages and phonemes and then fine-tune on higher quality one.