Closed sjlynch closed 1 year ago
Thanks for the feedback, @sjlynch, I'm glad the library is working well for you!
And you are correct, internally the library is maintaining a buffer of audio data so that streaming predictions can be made with low latency, but this does mean that usually there are multiple predictions for a single utterance of the wake word/phrase. In practice, there are three ways to handle this:
1) Implement some logic in your server code to suppress activations that occur too soon after a previous activation. That will provide enough time for the buffer to process enough audio to clear itself.
2) Manually feed in a few seconds of silence to flush out the buffer.
3) Manually set the past few seconds of the feature buffer (oww.preprocessor.feature_buffer
) values to zero. Though this method can sometimes lead to odd prediction behavior, so it isn't recommended.
I'm using this library (by far the best I've found, nice work!) for wake word detection using a browser to send audio to a server and feeding it into your model. As soon as it returns a reasonable prediction, the server returns a response and stops processing the audio. When this happens, the next time I send audio to the model, it immediately returns a high value for the previously found wake word, so it appears as though the model stores a buffer of the audio data. is there a way to clear that buffer out? Or should I feed it 2 or 3 seconds of silence for the same effect?