MycroftAI / mycroft-precise

A lightweight, simple-to-use, RNN wake word listener
Apache License 2.0
842 stars 227 forks source link

High rate of false positive in certain cases #204

Closed EuphoriaCelestial closed 3 years ago

EuphoriaCelestial commented 3 years ago

Hi, I trained quite a lot model with different word, plenty of records; they performed pretty good but have high rate of false positive in some special cases. For example, my wake word is "Hi big head baby", I can say "Hi" + some random word + "baby" and it recognized as wake word I have plenty of not-wake-word data and used incremental training method too, it reduced a little but still too many false trigger; please suggest some solution for this, thank you.

el-tocino commented 3 years ago

How many samples of wake word are you training with? How many different speakers? What parameters are you using when you train?

aadityarock2000 commented 3 years ago

I am also facing similar issues. I am using about 2000 wake-word samples and 13,000 non-wake word samples. even if the accuracy becomes 1.0, the model activates for any sound. I just followed the documentation given to train the wake word. I don't know what has to be done now.

el-tocino commented 3 years ago

What params did you use training it? What's the quality of your wake word samples like?

aadityarock2000 commented 3 years ago

@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz.

I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?

el-tocino commented 3 years ago

I'd focus on sourcing live samples of your wake word form humans and using that.

aadityarock2000 commented 3 years ago

@el-tocino I was trying to automate the wake-word generation for any word given, hence it is not possible to source the data using humans. there seem to be no errors in the test condition, but still, false positives occur all the time. I can't understand what the real problem is. image

Also, is there any guide to tuning the hyperparameters like the buffer time, hop time, window time, etc? It's the only option left in my case.

@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz.

I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?

JarbasAl commented 3 years ago

you are training your model in data that does not correspond to what the model sees in production, i have done similar experiences, it can be used to augment human recordings but its not a good enough replacement, you are essentially making your model learn a different task, it is ovefitting in your dataset, you might have a lot of samples but they are essentially all the same sample repeated over an over again

EuphoriaCelestial commented 3 years ago

How many samples of wake word are you training with? How many different speakers? What parameters are you using when you train?

I am using about 800 wake-word samples and 4000 not-wake-word sample, all parameters is leave as default I am not sure about how many different speakers because the dataset is from my friend, but I guess he recorded from 20-30 people

el-tocino commented 3 years ago

You might want to listen to a good sampling of those to make sure they're what you expect.

EuphoriaCelestial commented 3 years ago

You might want to listen to a good sampling of those to make sure they're what you expect.

yes I did, I listened to some random records and they are good

el-tocino commented 3 years ago

No idea, then. I'd have to see/hear the dataset to tell you much else.

EuphoriaCelestial commented 3 years ago

@el-tocino I was trying to automate the wake-word generation for any word given, hence it is not possible to source the data using humans. there seem to be no errors in the test condition, but still, false positives occur all the time. I can't understand what the real problem is. image

Also, is there any guide to tuning the hyperparameters like the buffer time, hop time, window time, etc? It's the only option left in my case.

@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz. I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?

I dont think you can generate speech to train this model. Since this model is extremely likely to overfit, even when using real data After trained dozens of Mycroft model, I found out this model also is vulnerable to many thing: volume of the data, accent, background noise, ....

el-tocino commented 3 years ago

See previous reply.

EuphoriaCelestial commented 3 years ago

No idea, then. I'd have to see/hear the dataset to tell you much else.

nevermind, I have fixed it by recording many many phrases sound similar to my wake word and put it into not-wake-word folder. Now I am facing another problem with background noise, thanks for ur support. Closing.