Closed sayangel closed 6 years ago
I typed all of this up and then noticed you're using an older version of Dissonance - the code snippet you show was changed in Sept 2017. Those changes included some fixes to playback quality in the face of terrible network conditions which seem like they could be relevant to you. I'd suggest upgrading to the latest Dissonance and seeing if that fixes the problem.
In case that does not fix the problem (or you're just curious) here's my original reply:
Jitter buffering protects against changes in latency between one packet and the next (aka jitter). If we played out a packet as soon as it arrived the next packet would have to arrive exactly on time, any delay at all and it would be too late because we must supply audio to the speakers on time. A jitter buffer deliberately delays packets slightly (worsens latency) to ensure there's always a packet available when needed (improves quality). In Dissonance packets are added to a buffer as soon as they arrive and a timer is started, after some delay (the _jitterDelay
value) playback begins. Since playback and capture are both operating at the same rate the buffer should then keep approximately the same number of packets in it at all times, only varying due to jitter in packet delivery/playback time. We also have a mechanism in the playback system which slightly changes playback speed (within a few percent) to deliberately expand or shrink the buffer as needed (so if jitter gets worse as playback proceeds the buffer can slowly be expanded to add more delay, and tolerate the extra jitter).
What exactly does the jitterDelay variable represent?
On the line you highlighted:
jitterDelay
is the amount to delay the audio before playback._jitter
is a struct which measures standard deviation in the packet delivery time, this is a rough estimate of network latency jitter.InitialBufferDelay
is the value to use if we don't have a good measure of the jitter yet, it's a fairly high value so it should generally be safe to start at that value and then reduce the delay down as we get a better measurement of network conditions._jitter.Confidence
is a value from zero to one indicating how many jitter measurements we have taken. We're linearly interpolating from InitialBufferDelay
to _jitter.Jitter * 2.5f
based on confidence.We were seeing values reach about 1.0 and this is when audio quality would get completely butchered. Hard limiting the jitterDelay value to .1 or 100ms then the audio stream would sound clear, but it would be delayed from the other person talking.
For a value of 1 second the jitter measurement must be measuring 400ms standard deviation in packet delay times (which is truly dreadful) - can you tell if this is a real network problem or if the the jitter meter is completely broken?
Were you getting any other warnings when playback was bad? For example if the delay gets very large there's a message Encoded audio heap is getting very large (N items)
printed out (threshold is set at 40+ items, or about 1.6 seconds of delay at the default settings). If you could send me some examples (martin@placeholder-software.co.uk) of bad audio that'd be handy. If you can get a log that'd be fantastic.
I had a listen to the audio sample you sent me. It sounds exactly like I would expect very bad network conditions to sound - packets are being dropped/lost and packet loss concealment is making up something to fill the gap (that's why it sounds muddy). There are no audio glitches (pops/clicks) which would indicate a non-network related audio problem.
First thing to try is definitely to upgrade to the latest Dissonance version to see if those changes I mentioned mitigate the issue.
I updated to 6.0.2 from the asset store and still seeing bad quality under stressed network conditions. For reference I'm using clumsy https://jagt.github.io/clumsy/ to simulate network conditions.
We use clumsy to test networking too. What settings are you using? I'll see if I can reproduce the problem.
For reference:
If you're still interested this issue check out this other issue: https://github.com/Placeholder-Software/Dissonance/issues/87#issuecomment-378182718
It turns out the server has been relaying unreliable packets (i.e. voice) with a reliable connection. This will make Dissonance worse at handling lost packets because subsequent packets will be delayed while the network re-sends the lost packet, causing a whole bunch of packets to be lost because when they eventually arrive they're all too late. Try changing the parameter on line 427 of BaseClient.cs
from true
to false
:
writer.WriteRelay(_serverNegotiator.SessionId, destinations, packet, FALSE);
Woah - that makes a lot of sense as to why we'd see that behavior. Thanks for the update!
Hey Martin - it didn't make too much of a difference. At least nothing night and day.
-Angel
What kind of clumsy settings are you using?
Tom just merged a PR of mine which may help with this issue - I've enabled Forward Error Correction (FEC) with the Opus codec. When elevated packet loss rates are detected this encodes a low quality copy of the previous frame into each packet, if the decoder comes to decode a frame and the current packet isn't here yet it uses the next packet (if available) to extract the low quality version of the frame instead, otherwise it falls back to Packet Loss Concealment (PLC).
Testing with the built in packet loss simulation (Window > Dissonance > Diagnostics) I could understand speech up to about 30% packet loss rates which is absolutely incredible! This isn't a perfect test, since packets are rarely lost on a purely random basis, but it's a significant improvement on before.
This will available in the next release of Dissonance :)
Dissonance 6.2.0 has just been submitted to the asset store, this includes the FEC fix I mentioned above. This should be available in a few days once the asset store team reviews it :)
Dissonance 6.2.0 is now available on the asset store so I'll close this issue now. Don't hesitate to re-open it if there's still a problem :)
We recently had a case where someone had really bad audio quality so we investigated why it was sounding so bad. After messing with some of the jitter parameters it seems like we can improve the audio quality at the expense of lag accumulation.
I'm wondering if there's a reference for the jitter algorithm that Dissonance is using in particular this line of code in
SpeechSession.cs
I understand the estimated delay starts at 100ms and then adjusts. What exactly does the jitterDelay variable represent? We were seeing values reach about 1.0 and this is when audio quality would get completely butchered. Hard limiting the jitterDelay value to .1 or 100ms then the audio stream would sound clear, but it would be delayed from the other person talking.
Any direction to understand the jitter compensation process would be appreciated. Happy to provide samples of the bad audio via email just send me an email to angel@insitevr.com