ARM-software / ML-zoo

Apache License 2.0
194 stars 51 forks source link

KWS for tensorflow lite micro #60

Open ctwillson opened 5 days ago

ctwillson commented 5 days ago

The feature extract used 'MFCC',however, It appears that TFLM (TensorFlow Lite for Microcontrollers) does not support the MFCC (Mel-Frequency Cepstral Coefficients) operator.How can I use it on TFLM?

Burton2000 commented 2 days ago

You would have to do the feature extraction within your pre-processing code, you can see an example of how to do this here: https://github.com/ARM-software/ML-examples/tree/506c941bebdeb55aedc1b8cc53f27c482cf67ec8/tflu-kws-cortex-m/kws_cortex_m/Source/MFCC or here: https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ml-embedded-evaluation-kit/+/refs/heads/main/source/application/api/use_case/kws/src/KwsProcessing.cc

ctwillson commented 1 day ago

You would have to do the feature extraction within your pre-processing code, you can see an example of how to do this here: https://github.com/ARM-software/ML-examples/tree/506c941bebdeb55aedc1b8cc53f27c482cf67ec8/tflu-kws-cortex-m/kws_cortex_m/Source/MFCC or here: https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ml-embedded-evaluation-kit/+/refs/heads/main/source/application/api/use_case/kws/src/KwsProcessing.cc

Thanks for your reply. BTW,Streaming processing is very important for keyword spotting,does ML-Zoo support it? As your know,for microcontrollers,the lantency is very important

Burton2000 commented 1 day ago

The ML-Zoo repository is only for providing ML models for people to use. It isn't focused on showing complete end to end embedded applications.

The links I provided above show how to use these KWS model in a streaming audio use case so should be helpful for you.

ctwillson commented 14 hours ago

The ML-Zoo repository is only for providing ML models for people to use. It isn't focused on showing complete end to end embedded applications.

The links I provided above show how to use these KWS model in a streaming audio use case so should be helpful for you.

For model side on embedded applications,actually,we don't need send 2s(or 1s) audio frame to predict because the audio stream is sequential,we just need like 10ms and then combine this.Sorry for my poor English,but you can read this paper https://arxiv.org/abs/2005.06720. I have no idea about how to do that.Do u have any suggestion?