dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
Apache License 2.0
748 stars 71 forks source link

Very good recall and accuracy but does not recognize specific voices #190

Open dilerbatu opened 4 months ago

dilerbatu commented 4 months ago

Hey everyone, I have a model that has got 0.90 accuracy, 0.81 recall which is quite good in my opinion. Also it does not fail on the field. The issue about this model is it gives very very low probability of certain voices. My keyword is "Hey Py Za". Unrecognizable voices are man and indian speakers. Any advise ?

I have used 50k data 700k steps and 3000 negative weight

Thanks.

dscripka commented 2 months ago

In cases like this, it is almost always due to limited similarity in the synthetic training data to the target voices. While the TTS model used to generate the training data (Piper) should produce a wide range of different voices, because it was trained on the LibriTTS dataset it may have relatively low representation of different accents (including Indian speakers).

It is difficult to fix this issue without adding more training data that is more similar to the target speakers you expect in deployment. If you have real audio samples, or another TTS model that can more effectively produce other languages/accents, you can add these to the training data and you should see improved performance.

dilerbatu commented 2 months ago

Thanks for answer!