Placeholder-Software / Dissonance

Unity Voice Chat Asset
71 stars 5 forks source link

Feature request: LAN quality voice #15

Closed citron4000 closed 7 years ago

citron4000 commented 7 years ago

Hello,

We have a bit of a special setup here and we are testing and demoing our VR application with all players using the same Wifi network for now (several people collaborating in VR on the same LAN, in different rooms).

Since we don't have any bandwidth concerns (for now), we were wondering if you have any advice on how to tweak the Quality and Frame Size to really push it to the best quality and latency that we can. We are already using Frame Size = Small and Audio Quality = High but were wondering if we can go even higher than that.

Thanks!

martindevans commented 7 years ago

Before I go into detail on the different settings I just want to check: would you consider the voice quality at the top settings you're using very good and you're simply looking for awesome? Or is it actually not great? Dissonance should have audio quality amongst the very best VoIP apps you've ever used at these top end settings, and if it's not great then we probably have a bug somewhere in the audio pipeline.

There are three things I can think of you could tweak. Frame size/audio quality directly correspond to Opus settings and I don't recommend changing them (although you can ignore me and do it anyway, details of how to do so are below). You can also change jitter compensation delay which is a (hardcoded) Dissonance thing and changing that is totally fine.

Frame Size

This is all about latency, smaller frame size means less latency. For a conversation to flow your total latency should be less than ~150ms. The Small setting equates to 20ms so it shouldn't really be a problem (Dissonance adds some additional latency in it's pipeline, but there's nothing you can do about that).

The Xiph wiki has this to say:

A 20ms frame size works well for most applications. Smaller frame sizes may be used to achieve lower latency, but have lower quality at a given bitrate.

If you still want to tweak it have a look in Dissonance.Audio.Codecs.Opus.OpusEncoder. This file is responsible for turning the high level "Frame Size" setting into actual numbers. Starting at line 53:

public int GetFrameSize(FrameSize size)
{
    switch (size)
    {
        case Dissonance.FrameSize.Small:
            return _encoder.PermittedFrameSizes[3]; // 20ms
        case Dissonance.FrameSize.Medium:
            return _encoder.PermittedFrameSizes[4]; // 40ms
        case Dissonance.FrameSize.Large:
            return _encoder.PermittedFrameSizes[5]; // 60ms
        default:
            throw new ArgumentOutOfRangeException("size", size, null);
    }
}

Opus only allows certain frame sizes, these permitted values are available in the PermittedFrameSizes array and correspond to 2.5ms, 5ms, 10ms, 20ms, 40ms, 60ms. I really recommend you don't use less than the 10ms option, doing so will disable certain features of the encoder and likely severely impact quality!

Audio Quality

Audio quality directly determines the target codec bitrate. Bitrate is the more "pure" quality setting than frame size.

This is controlled in the same file (Dissonance.Audio.Codecs.Opus.OpusEncoder) Line 37:

private int GetTargetBitrate(AudioQuality quality)
{
    // https://wiki.xiph.org/Opus_Recommended_Settings#Recommended_Bitrates
    switch (quality)
    {
        case AudioQuality.Low:
            return 10000;
        case AudioQuality.Medium:
            return 17000;
        case AudioQuality.High:
            return 24000;
        default:
            throw new ArgumentOutOfRangeException("quality", quality, null);
    }
}

That helpful inline link leads to a table in the Xiph wiki with some recommended bitrates for different applications. The three bitrates here correspond to:

So these values really cover the full range of what Xiph recommend for plain mono voice (but tweaking them won't hurt anything except your bandwidth).

Jitter Delay

So far I've discouraged tweaking the settings, because they're based on what Xiph recommend and those guys really know what they're talking about when it comes to audio (they designed the opus codec). There's one extra thing you could tweak which is part of Dissonance.

In Dissonance.Audio.Playback.VoicePlayback Line 120 we have the code for starting a new speech session:

public void StartPlayback()
{
    // start a new session with the current encode settings, and a 80ms fixed buffer
    _sessions.StartSession(new FrameFormat {
        Codec = Codec.Opus,
        FrameSize = _inputFrameSize,
        WaveFormat = _inputFormat
    }, TimeSpan.FromMilliseconds(100));
}

This session does not start playing instantly, instead it buffers up packets for a certain time (hardcoded 100ms here) and then starts playback. This means the playback system can handle 100ms of network jitter before it starts using packet loss concealment (which seriously degrades audio quality). Since you're on a LAN your jitter is likely very low. Try setting that delay to your 99th percentile latency (or if you haven't/can't measure that I'll make an educated guess and say try 3x your average latency).

Edit: I should mention why I say 99th percentile here. This is essentially the chance that jitter will exceed your buffer and cause Packet Loss Concealment (PLC) to cut in. PLC essentially just makes audio up on the fly and it doesn't sound great - you want to avoid it as much as possible. with 99% percentile buffering time you essentially have a 1% chance at any given time that jitter will get so bad that audio quality is compromised.

Reducing this will reduce the delay between speech and reply, which is a large factor in perception of voice audio quality.

citron4000 commented 7 years ago

As always, thanks a lot for your detailed response and amazing reactivity!

I need some time to test these out live with colleagues in the real setup (different rooms) and check the quality again with your Unet demo and then in our application. I'll report here once I have studied this more on my side.

martindevans commented 7 years ago

Hi Citron,

I'm going to close this since the discussion is long since dead. Of course I'm happy to open it again if you have more to discuss :)