Hi,
I am trying to clone my voice (ITALIAN) starting from a recording of myself, but I have some problems:
How big must be the audio files to be used for training? Is a 2 minutes audio ok?
I successfully trained my voice, but when i generate the audio He speaks incomprensibile words with an "english accent" and not italian accent, but i have the it_tokenizer in the config.... what could be the problem?
What is an "optimal" training configuration? On the video guide you do something based on 20 epochs, is that enough? (I also did with 20)
Hi, I am trying to clone my voice (ITALIAN) starting from a recording of myself, but I have some problems:
Thanks