Closed Ca-ressemble-a-du-fake closed 1 year ago
I could correct this by doing the following things :
speaker_name = cols[1]
so that the formatter outputs the name of the speaker (stored in second column in my csv)Now the training is working. So I believe this recipe must be run against a multi speaker datasets. Single speaker dataset may not be supported.
Maybe it can work with 22kHz audio but I did not test it as I only have 16kHz multi speaker datasets and a single one in 22kHz.
I tried to finetune the YourTTS model with my own small dataset and faced the same error as this issue. My dataset includes 256 audio data and I made it in LJSpeech format. Thanks for the suggested solutions above but since changing the source code is not preferable to me, I studied this problem a little bit.
Let me get straight to the point, I think the reason of this error is not about multi-speaker or single-speaker, this issue occurs when the dataset is relatively small. I tried to train from scratch using only the LJSpeech-1.1 dataset but the error did not occur. So we can tell single-speaker format is not the problem.
Then I then made a subset of LJSpeech with only the first 1024 data and train from scratch again, the error is reproduced in this case.
From the Python log, we can see the error occurs during the evaluation stage of the training.
By default, the evaluation split proportion is 0.01. In this simulation, the size of evaluation set would be 1024*0.01=10, which is smaller than the default batch size 32.
By explicitly declaring eval_split_size=32
, the problem is solved.
Furthermore, it should be aware that, when any of the training data is discarded by MAX_AUDIO_LEN_IN_SECONDS
and the size of evaluation set is less than batch size, this problem will happen.
To conclude, this bug occurs when the actual size of evaluation set is less an 1x batch size. The training-evaluation split proportion, discarding of samples, and inappropriate hyperparameters (such as inconsistency between BATCH_SIZE
and eval_split_max_size
) may cause the problem.
Describe the bug
Hi,
When running YourTTS recipe with my own LJSpeech dataset, during the first evaluation I get the following error :
I updated the trainer to the latest version following the instructions for github but the issue still occurs.
Also note that training VITS model against the same dataset (and also the same max value [10 seconds or 10 x 22050]) is working. So it stops only when running the YourTTS recipe. I will try with debug mode ON and see if it shows interesting things.
Here is the adapted recipe :
To Reproduce
Create a dataset in LJSpeech format (22,05 kHz audios) in French. Adapt dataset config, sample rate in the provided recipe. Launch it.
Expected behavior
The training should go on.
Logs
No response
Environment
Additional context
No response