ARM-software / ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
Apache License 2.0
1.13k stars 416 forks source link

Training the model with GFCCs #107

Open saichand07 opened 5 years ago

saichand07 commented 5 years ago

Hello Everyone, Is anyone of you used Gammatone filter banks instead of LFBEs and MFCCs to train your Model ? what is the SNR (signal to noise ratio) of these models ? Thank you!!

tpeet commented 5 years ago

I tested GFCCs in training and also on microcontroller. Worked better than MFCCs, but my current implementation also requires more resources. I don't quite understand what you mean by SNR of a model.

I'm currently working on improving the prototype version of GFCC extraction and I'm at the moment testing it on my custom data set. It's quite raw and dirty at the moment, but maybe you will find something useful: https://github.com/tpeet/ML-KWS-for-MCU.

Things I've learned:

saichand07 commented 5 years ago

@tpeet Thank you very much, really helpful I trained my models with LFBEs, which are giving better results than MFCCs in python. I haven't deployed yet on microcontroller

saichand07 commented 5 years ago

@tpeet Have you tested or checked the what is the Word Error Rate (WER) and False word detection rate of your models on board.

tpeet commented 5 years ago

@saichand07 , haven't tested on KWS task, I used my own bird sounds dataset. It's very hard to get WER, as it depends on your embedded system, surrounding noise, how far you are from microphone etc...

But maybe my research gives you some ideas, how it can impact the accuracy. I played some audio clips from the speakers and recorded them through the embedded system, generating more realistic validation and testing set. When testing on original audio recordings, test accuracy was over 90%, when testing on the recordings from embedded system, the accuracy dropped to 70% for MFCCs. Therefore, I also recorded background noise this manner, which was added to training samples. This helped to improve accuracy from 70% to 75% in case of MFCCs. With GFCCs I got around 79%, even without the added background noise.

saichand07 commented 5 years ago

@tpeet Thank you very much, yes, I read from some literature on GFCCs, which are performing very well in presence of background noise when compared it to MFCCs. (https://pdfs.semanticscholar.org/6b5d/1d2767fbd0ce9670bf334b3fd73a4cbb3a33.pdf )