Question regarding training of the model

Hiroshiba / become-yukarin

Convert your voice to favorite voice

MIT License

576 stars 88 forks source link

Hello,

Just a quick question regarding training. Does the audio that you supply need to match directly (as in, length, words, etc)?

For example, I speak English, but I have Japanese voice clips that have the character's tone and expression (they are extracted voice files from a web game). However, since I cannot speak Japanese, I was wondering if I can do something like this:

input/file_001.wav: (Hello in English)
target/file_001.wav: (Clip 1 of character voice from web game)

input/file_002.wav: (Another phrase in English)
target/file_002.wav (Another clip of character voice from web game)

...and so on.

I guess another way of asking the question is does the input data need to match the target data, or can I feed the converter different clips of both my voice and the character's voice, and the training will work it all out?

Cheers!

Hiroshiba / become-yukarin

Question regarding training of the model #68