MycroftAI / mycroft-precise

A lightweight, simple-to-use, RNN wake word listener
Apache License 2.0
838 stars 227 forks source link

help improve accuracy in noise environment #113

Open thanhnghic4 opened 4 years ago

thanhnghic4 commented 4 years ago

I'm trying to train my model to work in noise environment and I have 2 problem:

  1. I know that system will cut my train audio in begin and only keep 1.5 last second. My average duration is about 1 sec. So I should add more silence data to make it 1.5 second or I should ajust the param from 1.5 to 1 sec.
  2. I want my model work in noise environment, should I mix my dataset with noise background and train it.
MatthewScholefield commented 4 years ago
  1. Try training both ways, once with the default 1.5 seconds (and it will automatically add silence to the beginning), and then try teaining with the param set to 1.0 or 0.9 seconds
  2. Yes, you can try this using precise-add-noise as it should improve the robustness of the model
thanhnghic4 commented 4 years ago

@MatthewScholefield thank alots

thanhnghic4 commented 4 years ago

@MatthewScholefield I have new questions about false positive. Can you help me about this ? what should I put in not-wake-word data ? I read some recommend that I should put any data with ratio at least 3 not-wake-word : 1 wake-word. but I never can get that radio. with this radio, the accuracy is very very low but false positive still happen. This is my data for not-wake-word: https://github.com/MycroftAI/Precise-Community-Data google command data common data of deepspeech. I also try to mix it together to create noise not-wake-word data. What duration not-wake-word data should be ? it should be long or just 1.5 second like wake-word.

Since I can not collect many data, I just have about 150 samples. then I try to generate more data from google text to speech or some resource from internet. It's about 400 sample. Then I try to argument it, adjust volume and mix with noise-data. And now I have about 60.000 samples. I do the same thing with not-wake-word, mix it together to create more data.

el-tocino commented 4 years ago

150 samples for the wake word is reasonable, especially if there's some variation in the speaker or environmental noise with them. The not-wake-word bits you can use pretty much anything else, however it's probably more beneficial to use antagonistic words and noises where possible. To this end, you can enable saving of wake words on mycroft to capture all the positives it thinks it matches. Sort those into wake/not-wake, of course. If you see a pattern, then record some not-wake-words that match it, ie, "gasoline" would trigger my custom wake word, so I recorded a bunch of samples of "gasoline" and add them to the not-wake-word. My air conditioner would false trigger it, so I recorded that and added it to the not-wake-words. Combined with the google commands, the noise data set, and some other words I recorded, I now have excellent results on my model. More of my notes about making a model here. I think I have about 20 not-wake-words per wake-word at this point, but the sheer volume or ratio isn't the important thing, it's finding data that improves your model's recognition.

thanhnghic4 commented 4 years ago

@el-tocino thank you, I read your notes. You try to train lots of epoch over 0.999 . Is it necessary ? My model often go to 0.98 in first 1000 epochs, I will keep train util 5000 or 1000 but I never try go train deeply like that. I can not recognize any different if we train more .

el-tocino commented 4 years ago

On a previous version of precise it was somewhat different. The 0.30 and above versions, it's usually 500-1000 now.

thanhnghic4 commented 4 years ago

@el-tocino can you give 1 a link of your model . I just want to test how good it it ? base on your experiment, how many samples is enough for a good models ?

el-tocino commented 4 years ago

Check in here: https://github.com/MycroftAI/Precise-Community-Data

thanhnghic4 commented 4 years ago

@el-tocino can you help me about this question: I found that we have a method "precise-collect" to help collect data and when I use it. the output audio is adjuted like this: image

with wav signal in range (-300,300) but my raw audio is in range (-10000, 10000) so do I need to normalize my train audio into this range (-300, 300 ) ? I check data in here https://github.com/MycroftAI/Precise-Community-Data and found that is in many diffirent range. And if all audio is normalize in this range, I tried to argument my data lots with increase and decreasing volume, that mean it will change nothing ?

I use this method precise-listen model -d folder to save my audio. with true result, I copy it again and put it into test folder and run test, it's always give false result ?? thank for your help!

el-tocino commented 4 years ago

Not sure what your numbers are referring to? I didn't normalize my training data. Once I had my wake word designated, I started recording all activations with it. That resulted in a good set of "noisy" data for both wake and not wake words. Get at least 20 wake words to start with, and should have at least as many not wake words to train against

thanhnghic4 commented 4 years ago

@el-tocino I checked, yes that's my mistake. Source code did not normalize anything. I want to create a model without real data, and all my data is generate from lots of speech to text method (google api and internet) . I tried to argument my data to make my model work in far distance but it's only work in about 20cm to microphone, do you have any experience on this.

el-tocino commented 4 years ago

I think you'd end up with a mediocre model using just generated data. As supplemental data that's been done before. Use a good set of your own samples to start with for best results (in my experience).

As for distance, that's a function of microphone type, and settings. Mic arrays do a better job at longer range (respeaker mic array, google voice hat, etc), but distance is a big enemy of sound quality.

thanhnghic4 commented 4 years ago

@el-tocino after 3 week and training hundreds time with generate data, I have to accept that you're right . I collected new data and trained with it, everything is very promising now. Thank you so much!

el-tocino commented 4 years ago

Good to hear!