Hiroshiba / become-yukarin

Convert your voice to favorite voice
https://hiroshiba.github.io/blog/became-yuduki-yukari-with-deep-learning-power/
MIT License
576 stars 88 forks source link

Question regarding training of the model #68

Closed SoftwareGuy closed 4 years ago

SoftwareGuy commented 4 years ago

Hello,

Just a quick question regarding training. Does the audio that you supply need to match directly (as in, length, words, etc)?

For example, I speak English, but I have Japanese voice clips that have the character's tone and expression (they are extracted voice files from a web game). However, since I cannot speak Japanese, I was wondering if I can do something like this:

input/file_001.wav: (Hello in English)
target/file_001.wav: (Clip 1 of character voice from web game)

input/file_002.wav: (Another phrase in English)
target/file_002.wav (Another clip of character voice from web game)

...and so on.

I guess another way of asking the question is does the input data need to match the target data, or can I feed the converter different clips of both my voice and the character's voice, and the training will work it all out?

Cheers!

Hiroshiba commented 4 years ago

does the input data need to match the target data

Yes. ;-)

The training will fail. Maybe, your voice will be converted to character's voice, but spoken words will be incomprehensible.

There are some sounds("phoneme") that are not in Japanese but exist in English. If you want to convert in English, you will need to train in English. However, even if I train with Japanese voice data and convert English voice data, I think it can be converted not so bad.