Closed LordSyd closed 3 months ago
The idea behind the OOD list is for validation. The model isn't trained on this text but it is just used to test the checkpoints during training. So if training in a language other than English it would most certainly make sense to adjust the OOD_list.txt to only contain German text in this instance.
I wouldn't split any training data off for the OOD_list.txt. There is no paired audio to go with it, it is only text data. This means it can be anything and you don't need to sacrifice any of your valuable audio.
I'm going to close this as adding your code to remove from the audio dataset wouldn't be something I would like to implement as that text data can be pulled from anywhere else. Feel free to continue commenting if you have anymore questions.
I had the problem that the finetuned model wasn't able to generate coherent German sentences until I made my own OOD_texts file containing only German texts. I did this by modifying the phonemize script to split the training data also by 80/20 (meaning 80% training, 20% OOD), and using this OOD file in training. After that, even after just training for 10 epochs the quality was vastly improved. Meaning the generated sentence was at least coherent German. At the moment I am training a run for 200 epochs to see if the training length improves things. I am just unsure if the OOD data should be by another speaker, or if the same speaker files but unused for training are sufficient.
I mainly opened this issue to ask If you think it would make sense to add this to the readme and if you want me to provide the code for the modified phonemize script to generate OOD data.
In the meantime I will try reading up on OOD data and how that should be structured and how much OOD data would be sufficient for training.