VAD in example doesn't seem to work

When running full_example.py, the speech recognition itself works fine, but the VAD iterator completely fails to detect voice activity, distinguishing only between "sound" and "silence".

My understanding is that audio_iterator should yield a block of audio data if the input contains voice, and None otherwise. If so, this doesn't work on my system. As long as there is any sound being recorded by the microphone at all, the iterator yields audio blocks. I have tested this with snapping my fingers, scratching on the desk, even the background noise of a ceiling fan running – they all cause the iterator to produce blocks. Only virtually total silence produces None.

As a result, the end of phrase isn't detected unless the room is very, very quiet. I have done multiple test recordings from the same microphone setup and found them to be clear and without additional noise. Yet as soon as there is any input above a certain threshold, even if it is obviously non-human in origin, it is classified as voice. A modern VAD should be able to do much better.

Is this actually working for you? What could be the reason for the VAD to fail so completely?

daanzu / kaldi-active-grammar

VAD in example doesn't seem to work #70