Open thanhnghic4 opened 4 years ago
precise-add-noise
as it should improve the robustness of the model@MatthewScholefield thank alots
@MatthewScholefield I have new questions about false positive. Can you help me about this ? what should I put in not-wake-word data ? I read some recommend that I should put any data with ratio at least 3 not-wake-word : 1 wake-word. but I never can get that radio. with this radio, the accuracy is very very low but false positive still happen. This is my data for not-wake-word: https://github.com/MycroftAI/Precise-Community-Data google command data common data of deepspeech. I also try to mix it together to create noise not-wake-word data. What duration not-wake-word data should be ? it should be long or just 1.5 second like wake-word.
Since I can not collect many data, I just have about 150 samples. then I try to generate more data from google text to speech or some resource from internet. It's about 400 sample. Then I try to argument it, adjust volume and mix with noise-data. And now I have about 60.000 samples. I do the same thing with not-wake-word, mix it together to create more data.
150 samples for the wake word is reasonable, especially if there's some variation in the speaker or environmental noise with them. The not-wake-word bits you can use pretty much anything else, however it's probably more beneficial to use antagonistic words and noises where possible. To this end, you can enable saving of wake words on mycroft to capture all the positives it thinks it matches. Sort those into wake/not-wake, of course. If you see a pattern, then record some not-wake-words that match it, ie, "gasoline" would trigger my custom wake word, so I recorded a bunch of samples of "gasoline" and add them to the not-wake-word. My air conditioner would false trigger it, so I recorded that and added it to the not-wake-words. Combined with the google commands, the noise data set, and some other words I recorded, I now have excellent results on my model. More of my notes about making a model here. I think I have about 20 not-wake-words per wake-word at this point, but the sheer volume or ratio isn't the important thing, it's finding data that improves your model's recognition.
@el-tocino thank you, I read your notes. You try to train lots of epoch over 0.999 . Is it necessary ? My model often go to 0.98 in first 1000 epochs, I will keep train util 5000 or 1000 but I never try go train deeply like that. I can not recognize any different if we train more .
On a previous version of precise it was somewhat different. The 0.30 and above versions, it's usually 500-1000 now.
@el-tocino can you give 1 a link of your model . I just want to test how good it it ? base on your experiment, how many samples is enough for a good models ?
Check in here: https://github.com/MycroftAI/Precise-Community-Data
@el-tocino can you help me about this question: I found that we have a method "precise-collect" to help collect data and when I use it. the output audio is adjuted like this:
with wav signal in range (-300,300) but my raw audio is in range (-10000, 10000) so do I need to normalize my train audio into this range (-300, 300 ) ? I check data in here https://github.com/MycroftAI/Precise-Community-Data and found that is in many diffirent range. And if all audio is normalize in this range, I tried to argument my data lots with increase and decreasing volume, that mean it will change nothing ?
I use this method precise-listen model -d folder to save my audio. with true result, I copy it again and put it into test folder and run test, it's always give false result ?? thank for your help!
Not sure what your numbers are referring to? I didn't normalize my training data. Once I had my wake word designated, I started recording all activations with it. That resulted in a good set of "noisy" data for both wake and not wake words. Get at least 20 wake words to start with, and should have at least as many not wake words to train against
@el-tocino I checked, yes that's my mistake. Source code did not normalize anything. I want to create a model without real data, and all my data is generate from lots of speech to text method (google api and internet) . I tried to argument my data to make my model work in far distance but it's only work in about 20cm to microphone, do you have any experience on this.
I think you'd end up with a mediocre model using just generated data. As supplemental data that's been done before. Use a good set of your own samples to start with for best results (in my experience).
As for distance, that's a function of microphone type, and settings. Mic arrays do a better job at longer range (respeaker mic array, google voice hat, etc), but distance is a big enemy of sound quality.
@el-tocino after 3 week and training hundreds time with generate data, I have to accept that you're right . I collected new data and trained with it, everything is very promising now. Thank you so much!
Good to hear!
I'm trying to train my model to work in noise environment and I have 2 problem: