Ant-Brain / EfficientWord-Net

OneShot Learning-based hotword detection.
https://ant-brain.github.io/EfficientWord-Net/
Apache License 2.0
231 stars 37 forks source link

Is it possible to add noise or reverb into sample label to increase the recall ? #38

Open Leeviber opened 1 year ago

Leeviber commented 1 year ago

Hi, The performance of the model are really good when the voice is clean, however if the background is not clean with some noisy or room reverb, the recall rate is really low. is it possible to add some background noise or reverb into keyword audio sample to increase the detect rate under complex scene, Will it affect the recognition success rate of the model? Is such data enhancement done during training?

TheSeriousProgrammer commented 1 year ago

We have added some augmentations during training, but reverb was not included.

During high noise situations, the hotword detector may face issues because it is trained to look for all vocal patterns and match them with the user's provided samples.

One possible solution is to treat the base model as a foundation model and fine-tune it on around 5-10 user-provided samples for a specific word. However, this may cripple the model's ability to identify new words out of the blue.

As suggested, you can consider adding some samples with noise and variations in accents for a word's pronunciation that you want to consider.

After this, you can increase the accuracy threshold.

TheSeriousProgrammer commented 1 year ago

I am currently looking at a clip like architecture to better boost the perfomance of the system

damian-666 commented 1 year ago

i use a cartoid directional moouse and get good performance in general with a sure sv 1000 ro the legendary 21m 58. These are heavy and will give you strain in 4 hours, but on a stand if you hare doing hands free work.. commanding your computer they are the best.. I have a small room fans running , etc. i use a preamp its about 200$ total and there might be cheaper karaoke mic but look for a dynamic , not condenser mic ( though some mght work ok) but make sure its verry directional in is pattern. even mic arrays on lap tops dont work well for this, a singers mic is the best IMO

TheSeriousProgrammer commented 1 year ago

Like @damian-666 pointed out voice assistants employ directional mics to combat the same problem. The idea is that noise heard by the all mics would be uniform but the volume of voice heard by the mic won't be uniform, they employ some simple math to achieve noise reduction by a great level. This is difficult to do so with a single mic. It would help if you could share a video recording of the issue