How to transfer speech style from another reference audio?

NVIDIA / radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.

MIT License

280 stars 40 forks source link

How to transfer speech style from another reference audio? #5

Open jiamingkong opened 2 years ago

jiamingkong commented 2 years ago

Hi, thank you for providing the pretrained weights. I can now synthesize speeches by providing texts. My question is how to replicate the example of conditioning on another reference audio for the pitch and f0 information? (As shown in the example to rap in the project page)

Thanks!

rafaelvalle commented 2 years ago

Please take a look at Inference Voice Conversion demo in the README.

skyler14 commented 2 years ago

For reference, in this portion of the demo what additional training/fine-tuning should generally be performed. That is, if you are just editing the alignment of some source audio using your voice do you need to do certain types of training in relation to both voices?

Syed044 commented 1 year ago

emotion = 'other' if len(d) == 3 else d[3] IndexError: list index out of range

still hit the same error. i added the speaker at the end of the line. yet i find the same error.

@rafaelvalle why do you have so many different formats? tacotron2 has a different file formate, flowtron with a different one and now this is at a different level.

why can't you make it simpler for all like tacotron2?

Please help me with the error.