Wake Word Model Evaluation

ghost commented 3 years ago

I'm working on the script to evaluate our wakeword models, and my current approach to calculate FRR is:

Create long stream of audio comprised of test samples containing "hey snips", separated by 1 second of silence.
Set up SpeechPipeline as it would be for input through a microphone, but instead select the long "hey snips" wav as input stream.
Monitor how many times wakeword is detected, compare to number of time it is present in the signal, divide by total duration of wav to get false rejections/hour.

This all seems well and good, and it's clear that we can then adjust the posterior threshold to find the appropriate setting for our desired FRR (or sweep over for evaluation), but thus far the model is not detecting any wakewords using this pipeline. It definitely does when I speak into the microphone, so I'm wondering if this is the best way to go about testing.

Do any of you have thoughts or references I could check out to guide the process?

bayestehtashk commented 3 years ago

Why not using hey snips negative samples ?

ghost commented 3 years ago

Right now, I'm using the positive samples to get FRR, and negative samples for FAR, both from the hey-snips dataset. This question was just for FRR, but do you mean we should be passing both negative and positive samples for FRR?

bayestehtashk commented 3 years ago

For FAH, concatenation should be fine but for positive, you should be careful. you might get multiple positive results around one occurrence of hey snips.

ghost commented 3 years ago

I did notice this! I'm accounting for this by determining whether the previous posterior was above the threshold, if so, only counting it as a single positive result. I made sure to add three seconds in-between each sample, which should be sufficient to ensure this procedure will work. I tested on a small set, and it seemed to work well. But, please let me know if you have any suggestions of how to better handle this.

bayestehtashk commented 3 years ago

One simple solution is not using concatenation and pass separate segment one by one

MerlinPCarson / WakeWord-Detection

Wake Word Model Evaluation #9