Closed EuphoriaCelestial closed 3 years ago
How many samples of wake word are you training with? How many different speakers? What parameters are you using when you train?
I am also facing similar issues. I am using about 2000 wake-word samples and 13,000 non-wake word samples. even if the accuracy becomes 1.0, the model activates for any sound. I just followed the documentation given to train the wake word. I don't know what has to be done now.
What params did you use training it? What's the quality of your wake word samples like?
@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz.
I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?
I'd focus on sourcing live samples of your wake word form humans and using that.
@el-tocino I was trying to automate the wake-word generation for any word given, hence it is not possible to source the data using humans. there seem to be no errors in the test condition, but still, false positives occur all the time. I can't understand what the real problem is.
Also, is there any guide to tuning the hyperparameters like the buffer time, hop time, window time, etc? It's the only option left in my case.
@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz.
I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?
you are training your model in data that does not correspond to what the model sees in production, i have done similar experiences, it can be used to augment human recordings but its not a good enough replacement, you are essentially making your model learn a different task, it is ovefitting in your dataset, you might have a lot of samples but they are essentially all the same sample repeated over an over again
How many samples of wake word are you training with? How many different speakers? What parameters are you using when you train?
I am using about 800 wake-word samples and 4000 not-wake-word sample, all parameters is leave as default I am not sure about how many different speakers because the dataset is from my friend, but I guess he recorded from 20-30 people
You might want to listen to a good sampling of those to make sure they're what you expect.
You might want to listen to a good sampling of those to make sure they're what you expect.
yes I did, I listened to some random records and they are good
No idea, then. I'd have to see/hear the dataset to tell you much else.
@el-tocino I was trying to automate the wake-word generation for any word given, hence it is not possible to source the data using humans. there seem to be no errors in the test condition, but still, false positives occur all the time. I can't understand what the real problem is.
Also, is there any guide to tuning the hyperparameters like the buffer time, hop time, window time, etc? It's the only option left in my case.
@el-tocino I used the default ones for the parameters. My samples are from google cloud text to speech and AWS Polly services. I just changed the pitch and stretched some of them. As my model detects any speech as a potential wake word, I felt that the non-wake-word data has some issues. I used some of the voice sample data from here: https://www.kaggle.com/imsparsh/accentdb-core-extended, and changed the sampling rate to 16k Hz. I later added white noise which suppressed some of the mis-activations, but any sound I make is considered a wake-word. Is there any guide as to how I should change the params, or should I be focussing somewhere else?
I dont think you can generate speech to train this model. Since this model is extremely likely to overfit, even when using real data After trained dozens of Mycroft model, I found out this model also is vulnerable to many thing: volume of the data, accent, background noise, ....
See previous reply.
No idea, then. I'd have to see/hear the dataset to tell you much else.
nevermind, I have fixed it by recording many many phrases sound similar to my wake word and put it into not-wake-word folder. Now I am facing another problem with background noise, thanks for ur support. Closing.
Hi, I trained quite a lot model with different word, plenty of records; they performed pretty good but have high rate of false positive in some special cases. For example, my wake word is "Hi big head baby", I can say "Hi" + some random word + "baby" and it recognized as wake word I have plenty of not-wake-word data and used incremental training method too, it reduced a little but still too many false trigger; please suggest some solution for this, thank you.