MerlinPCarson / WakeWord-Detection

Training and evaluation scripts for wake word detection DNN models.
Apache License 2.0
8 stars 1 forks source link

Model context #4

Open MerlinPCarson opened 3 years ago

MerlinPCarson commented 3 years ago

It seems the Spokestack framework expects an RNN as model and only passes a single frame to the model at a time. In the paper, the WaveNet model expects a time context of 182 frames (1.83s). Will we be creating an alternate version of this code to support longer time contexts for model input?

ghost commented 3 years ago

Just adding, same issue for the CRNN model; it is trained in the paper with a context of 1.5 seconds, 151 frames.

bayestehtashk commented 3 years ago

you process frame by frame but your result is not that valid in the beginning, you also buffer data, when you reach to that threshold you have an output with higher confidence

ghost commented 3 years ago

As an aside, for the CRNN architecture, I made some slight modifications to the WakewordTrigger class to first gather the correct context window before performing inference. It seems to be working just fine. The modified file is utils/CRNN_files/tflite.py . I just pushed a version which allows one to pass in the type of model they are working with (right now just default and CRNN) and it will work for that model type. Should be a similar process for the WaveNet model (if you haven't already done this, @mpc6 .