encs-humanoid / speech-and-hearing

speech and hearing systems for the IEEE ENCS Humanoid Robot Project
1 stars 1 forks source link

Improve voice activity detection #2

Open danielmcd opened 8 years ago

danielmcd commented 8 years ago

Integrate an improved algorithm for voice activity detection into listen_node.py.

The current voice activity detection algorithm is based only on the sound intensity measured during an audio capture window. This may be made more robust by using a more sophisticated algorithm.

This issue proposes to introduce the Moattar and Homayounpour algorithm [1], which is described as a simple but efficient real-time voice activity detection algorithm. Source code for implementing the algorithm in python is available in GitHub [2], but it operates on a file rather than an audio stream. The task is to modify the referenced implementation to work within the audio processing model of listen_node.py.

[1] http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/contents/papers/1569192958.pdf [2] https://github.com/shriphani/Listener

danielmcd commented 8 years ago

It's not integrated yet, but I made some progress on implementing the VAD algorithm mentioned. At the moment, I just have a test program that uses the algorithm to print out an indication of voiced and silent frames. More work is needed to validate that the algorithm is working correctly and find optimal parameters for it to work well in a variety of audio environments. I'm pretty sure it will work better than an intensity threshold alone, but that needs to be verified, too.