Picovoice / wake-word-benchmark

wake word engine benchmark framework
https://picovoice.ai/
Apache License 2.0
131 stars 28 forks source link

Set ApplyFrontend to true as recommended in the snowboy README #1

Closed yodakohl closed 6 years ago

yodakohl commented 6 years ago

According to the Snowboy Readme ApplyFrontend should be set to True if resources/alexa/alexa-avs-sample-app/alexa.umdl is used.

resources/alexa/alexa-avs-sample-app/alexa.umdl:` Universal model for the hotword "Alexa" optimized for Alexa AVS sample app. Set SetSensitivity to 0.6, and set ApplyFrontend to true. This is so far the best "Alexa" model we released publicly, when ApplyFrontend is set to true.

kenarsa commented 6 years ago

thanks, @yodakohl. Appreciate your contribution. We should run the benchmark again with this config. thankfully we are about to start a new run and can incorporate this into it. let's merge when the result is out as I want the result and code to be in sync.

kenarsa commented 6 years ago

Quick question ... Did you have a chance to run Snowboy with ApplyFrontend(True) on RPi zero or RPi3? My understanding is that this flag enables a few audio preprocessing routines before wake-word detection and I am wondering what is the effect on runtime metrics such as CPU/memory consumption. Thanks in advance.

yodakohl commented 6 years ago

I did some quick test but its a bit difficult to measure since Snowboy does voice activity detection. I only tested on the Pi ZeroW since I currently have no Pi3 running.

Frontend False Idle

%Cpu(s):  5.0 us,  2.0 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :   378936 total,    38228 free,    39856 used,   300852 buff/cache
KiB Swap:   102396 total,   102140 free,      256 used.   277904 avail Mem 

PID USER  PR  NI    VIRT    RES    SHR S %CPU %MEM  TIME+ COMMAND                                                                                                                                         
8061 pi  20   0   35992  12984   7640 S  4.9  3.4   0:06.05 python                                                                                                                                          
8063 pi  20   0    5608   2500   2232 S  2.6  0.7   0:00.98 arecord 

Frontend False Voice

%Cpu(s): 44.2 us,  1.3 sy,  0.0 ni, 54.2 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :   378936 total,    38724 free,    39360 used,   300852 buff/cache
KiB Swap:   102396 total,   102140 free,      256 used.   278400 avail Mem 

PID USER  PR  NI    VIRT    RES    SHR S %CPU %MEM  TIME+ COMMAND                                                                                                                                         
8061 pi 20   0   35992  13036   7640 S 41.9  3.4   0:09.30 python                                                                                                                                          
8063 pi 20   0    5608   2500   2232 S  1.9  0.7   0:02.15 arecord 

Frontend True Idle

%Cpu(s): 14.6 us,  1.0 sy,  0.0 ni, 84.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   378936 total,    38724 free,    39360 used,   300852 buff/cache
KiB Swap:   102396 total,   102140 free,      256 used.   278400 avail Mem 

PID USER  PR  NI    VIRT    RES    SHR S %CPU %MEM TIME+ COMMAND                                                                                                                                         
8121 pi 20   0   35852  12732   7504 S 12.1  3.4   0:02.70 python                                                                                                                                          
8123 pi 20   0    5608   2552   2284 S  2.6  0.7   0:00.43 arecord   

Frontend True Voice

%Cpu(s): 70.4 us,  2.0 sy,  0.0 ni, 27.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   378936 total,    38304 free,    39780 used,   300852 buff/cache
KiB Swap:   102396 total,   102140 free,      256 used.   277980 avail Mem 

PID USER   PR  NI    VIRT    RES    SHR S %CPU %MEM TIME+ COMMAND                                                                                                                                         
8131 pi 20   0   35824  13064   7840 S 66.8  3.4   0:08.63 python                                                                                                                                          
8134 pi 20   0    5608   2564   2296 S  2.0  0.7   0:00.52 arecord  

It seems like memory is unchanged but CPU jumped from 5% to 12% in idle and from 42% to 67% during voice activity. I used the arecord example in python:

python demo_arecord.py resources/alexa/alexa-avs-sample-app/alexa.umdl

The change I made:


+++ b/examples/Python/snowboydecoder_arecord.py
@@ -75,6 +75,7 @@ class HotwordDetector(object):
             resource_filename=resource.encode(), model_str=model_str.encode())
         self.detector.SetAudioGain(audio_gain)
         self.num_hotwords = self.detector.NumHotwords()
+        self.detector.ApplyFrontend(True)
kenarsa commented 6 years ago

Thanks a lot. This is really helpful. This is consistent with what we are measuring. I am going to add a script to allow for measuring this systematically in a reproducible fashion. I probably will use real-time factor instead of CPU usage. Thoughts?

kenarsa commented 6 years ago

After looking into Snowboy's documentation, I decided that in order to have an apple-to-apple comparison we shouldn't set this flag for a couple of reasons: (1) the flag essentially enables few audio preprocessing algorithms (e.g. automatic gain control and noise suppression) that are considered separate modules in the audio processing chain and not part of wake-word engine functionality. (2) this flag is only recommended for (some) universal models and not to be set for personal models. Thank you for looking into it.