In there models they have embedded MFCC so you just point the 16k audio stream in chunks and the streaming KWS works.
There is FFT in the python ops and just wondered with the TFlite models and a look at what they did above could improve performance.
This is all beyond me but after some testing the embedded MFCC seems approx 2x faster than an external routine with Librosa.
https://github.com/google-research/google-research/tree/master/kws_streaming
In there models they have embedded MFCC so you just point the 16k audio stream in chunks and the streaming KWS works.
There is FFT in the python ops and just wondered with the TFlite models and a look at what they did above could improve performance. This is all beyond me but after some testing the embedded MFCC seems approx 2x faster than an external routine with Librosa.
Dunno if the above is any you to you. @breizhn