dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
Apache License 2.0
718 stars 65 forks source link

Clearing the prediction buffer? #37

Closed sjlynch closed 1 year ago

sjlynch commented 1 year ago

I'm using this library (by far the best I've found, nice work!) for wake word detection using a browser to send audio to a server and feeding it into your model. As soon as it returns a reasonable prediction, the server returns a response and stops processing the audio. When this happens, the next time I send audio to the model, it immediately returns a high value for the previously found wake word, so it appears as though the model stores a buffer of the audio data. is there a way to clear that buffer out? Or should I feed it 2 or 3 seconds of silence for the same effect?

dscripka commented 1 year ago

Thanks for the feedback, @sjlynch, I'm glad the library is working well for you!

And you are correct, internally the library is maintaining a buffer of audio data so that streaming predictions can be made with low latency, but this does mean that usually there are multiple predictions for a single utterance of the wake word/phrase. In practice, there are three ways to handle this:

1) Implement some logic in your server code to suppress activations that occur too soon after a previous activation. That will provide enough time for the buffer to process enough audio to clear itself.

2) Manually feed in a few seconds of silence to flush out the buffer.

3) Manually set the past few seconds of the feature buffer (oww.preprocessor.feature_buffer) values to zero. Though this method can sometimes lead to odd prediction behavior, so it isn't recommended.

sjlynch commented 1 year ago

2 worked perfectly, thanks!