Open dilerbatu opened 4 months ago
In cases like this, it is almost always due to limited similarity in the synthetic training data to the target voices. While the TTS model used to generate the training data (Piper) should produce a wide range of different voices, because it was trained on the LibriTTS dataset it may have relatively low representation of different accents (including Indian speakers).
It is difficult to fix this issue without adding more training data that is more similar to the target speakers you expect in deployment. If you have real audio samples, or another TTS model that can more effectively produce other languages/accents, you can add these to the training data and you should see improved performance.
Thanks for answer!
Hey everyone, I have a model that has got 0.90 accuracy, 0.81 recall which is quite good in my opinion. Also it does not fail on the field. The issue about this model is it gives very very low probability of certain voices. My keyword is "Hey Py Za". Unrecognizable voices are man and indian speakers. Any advise ?
I have used 50k data 700k steps and 3000 negative weight
Thanks.