[feature] Ability to set Voice Activity Detection threshold?

haydenjameslee commented 7 years ago

I'm not familiar with how the provided WebRTC VAD works, but I'm assuming there's some kind of volume threshold that when reached the method: Dissonance_WebRtcVad_Process from AudioPluginDissonance.DLL returns 1. Would it be possible to modify this threshold so that the mic volume has to be louder before that method returns 1?

martindevans commented 7 years ago

The webRTC VAD is actually a very complex system that evaluates a load of different heuristics to determine when it should detect speech (this is why it was such a huge improvement over our own VAD, we only considered 2 heuristics). The only tweak which is available (at the webRTC level) is this Aggressivness enum (which isn't really meant for this kind of tweaking anyway). Dissonance doesn't touch it so it's at the default value which is kVadNormal - i.e. the VAD doesn't go any lower than it already is. WebRTC doesn't expose any configuration options for the VAD because it's adaptive; it automatically adapts to varying mic levels and background noise over time.

What problem are you trying to solve?

haydenjameslee commented 7 years ago

Ah ok. I was scared of something like that.

The problem is when doing voice calls there is often background noise, so it is streaming voice when nobody is actually talking, ie. the VAD is returning false positives.

martindevans commented 7 years ago

If the problem is background noise being transmitted you could try tweaking the noise suppression level. If the problem is simply the VAD activating and transmitting nothing that can't be fixed - it's a deliberate trade-off made by webRTC:

It's ok to transmit silence / mild noise
It's absolutely unacceptable to not transmit voice
Voice detectors are not perfect

With these things in mind you can see that it's always going to be transmitting some non-voice sounds, if there's any chance it's voice the VAD has to classify it as voice. FWIW this is backed up by our own experience with Dissonance - early on we had our own VAD built in which I tweaked to be pretty accurate (it was 90%+ accurate at classifying voice as voice). This turned out to be totally unusable in real life, I had to apply a load of hacks to increase the voice detection accuracy even at the cost of more frequently transmitting non-voice. That's what eventually drove us to the webRTC VAD since it was designed with those tradeoffs in mind from the start.

If you want to try tweaking the noise suppressor have a look in WebRtcPreprocessingPipeline.cs:

public WebRtcPreprocessingPipeline(WaveFormat inputFormat)
    : base(inputFormat, CalculateInputFrameSize(inputFormat.SampleRate), 480, 48000, 480, 48000)
{
    _preprocessor = new WebRtcPreprocessor(NoiseSuppressionLevels.High);
}

You can change that up one level higher to NoiseSuppressionLevels.VeryHigh (that's the max value).

martindevans commented 7 years ago

Since this issue has been quiet for a week I'm going to close it now. If you need more help feel free to continue posting here and I'll reopen it right away :)

By the way I'm just about to submit a minor change to fix a UI bug associated with this issue. The noise suppressor setting is exposed in the Voice Settings inspector (click the Voice Settings button in any DissonanceComms component) but it's not applied properly. I've fixed that, so from the next version of Dissonance you won't need to change the source code to change the level.

Placeholder-Software / Dissonance

[feature] Ability to set Voice Activity Detection threshold? #46