Voice audio issues with 2023.3 and new orchestrator

jackjansen commented 5 months ago

We're seeing all sorts of different issues with voice audio:

Too long delays (600ms for uncompressed audio over TCP)
- And this does not always seem to be reflected in the latency_ms in the stats: output...
Sometimes no audio at all (possibly due to orchestrator desync?)
Lots of error messages about missing audio (maybe correlated to the above?)

We need to investigate what is going on.

And let's use this issue to document whatever we find.

jackjansen commented 4 months ago

If it turns out that #144 has been fixed we should also check these points again. Crossing finger that they'll also be fixed.

jackjansen commented 4 months ago

Confirmed fixed.

jackjansen commented 4 months ago

Re-opened this. The second and third bullet point were indeed fixed, but the first one not yet.

ashutosh3308 commented 4 months ago

There were two things that we tested this and last week:

First issue: If all participants can hear each other. We tested it last week and found that all the participants could hear each other. So, this issue has been resolved.
Second Issue: The Reported Delay (via Stats) against the Actual Delay (by saying Hello).
- Ashu tested it with fiddlehead and his laptop. Two scenarios were tested with Direct TCP and Socket I/O, with simple_avatar, Without_HMD. A simple headphone was connected to fiddlehead and in-bulit microphone/speaker was used for the laptop.
- For Socket_IO and Direct_TCP, the difference between actual and reported latency was approx. 200 ms.
- The reported_latency was approx for Socket_IO and Direct_TCP was 150ms and the Actual_Latency was approx. 370ms. Please note that the latency for Direct_TCP was 15 ms less than the Socket_IO

ashutosh3308 commented 4 months ago

Jack and Ashu tested on MAC to verify if this additional latency could be a Windows_Issue.

On Mac, Direct TCP scenario was tested with simple_avatar, Without_HMDs, simple headphone and in-bulit microphone/speaker was used for the laptop.
The reported_latency was approx for Direct_TCP was 150ms and the Actual_Latency was approx. 360ms.

jackjansen commented 4 months ago

For reference: @jvdrhoof is also seeing some unexpected latencies, sometimes, with a completely different setup: point clouds over socketio or webrtc. Sometimes (but not always) he is getting latencies in the 500ms range, with all machines (both clients and the orchestrator) on local networks.

This may be a red herring, but it may mean that we are looking at a problem that has nothing to do with audio but is actually in a completely different place.

jackjansen commented 4 months ago

(and, completely beside the point: @ashutosh3308 please reserve backticks print("this is code") for code and verbatim things like numbers 3.0 or 2 ms and host names flauwte.local and such. Use italics with underscores for emphasis, or if you want something stand out. And use Strong emphasis with double stars only very sparingly.

jackjansen commented 4 months ago

On the branch, repeated flauwte to vrtiny measurement, TCP transport. Measured latency (clap plus echo) was 370ms. Except once, where it was about 600ms.

Reported latency on vrtiny was 120-180ms.

Vrtiny time was -17ms uncertainty 15ms. Flauwte time was -41ms uncertainty 47ms

jackjansen commented 4 months ago

Repeated the exact same experiment, with very similar outcomes.

jackjansen commented 4 months ago

Did another one today. Same setup. Delay was variable, and noticed that as measured latency went up reported latency also went up, and vice versa.

It almost seems as if the measured latency is approximately twice the reported latency. But this seems ridiculous....

jackjansen commented 4 months ago

Another observation: on the receiver side, if we compare latency_ms to voice receiver_queue_ms the difference is almost always 60ms.

On the sender side, in AsyncVoiceReader, we have record_latency_ms=28ms and output_queue_ms=30ms.

Assuming a millisecond or so for transmission all the reported numbers seem to match: record_latency+output_queue+transmission+receiver_queue == total_latency

jackjansen commented 4 months ago

Disabled the spatializer. Makes no difference: still about 200ms extra latency perceived versus measured.

jackjansen commented 4 months ago

But now it seems that the "DSP Buffer Size" we set in Project Settings -> Audio sticks. And it was set to "best performance" previously.

I have set it to "Best Latency". Difference between perceived and measured latency goes down to 140ms. dspBufferSize is now 256, and it was 960 previously. So 700 samples difference, so at 48000 sample frequency that is about 15ms.

So this does not explain the 60ms decrease in perceived latency. And it doesn't explain where the other 140ms go.

Need to test Mac-to-Mac, to ensure were not also looking at some Windows problem.

jackjansen commented 4 months ago

Experimented with audioFps=100, in other words: a single packet is now 10ms in stead of 20ms.

Perceived latency goes down to 180ms. Measured latency goes down to 60ms. So measured is a bit closer to perceived, but not all that much (still 120ms difference).

jackjansen commented 4 months ago

Go the other way, audioFps=10. This should greatly increase the latency, but that latency difference should show up in the reported latency.

Record latency=60, output queue=100-160, receiver queue=200, latency=300-350. So those numbers seem to match, somewhat.

Perceived latency is around 500. so we are still looking at a discrepancy of around 150ms.

jackjansen commented 4 months ago

Repeated at home, Mac->Windows (sap to beelzebub). Measured latency 800ms, reported latency around 500ms. So a difference of 300ms, much higher than yesterday.

I did notice that both machine had dspBufferSize=1024.

jackjansen commented 4 months ago

Set both machines to Project Settings -> Audio -> DSP Buffer Size -> Best latency. Make very little difference.

jackjansen commented 4 months ago

Set both machine to mono, no spatialised, no virtual effects, no audio suspension. Makes very little difference.

jackjansen commented 4 months ago

Next experiment: Mac to Mac (using beignet as the receiver). Reported latency almost rock-solid at 510ms, almost exactly the sum of the components. Measured latency 660ms.

jackjansen commented 4 months ago

Unfortunately all these experiments show is that various theories we had are all wrong:

It is not a Windows driver problem,
it is not a spatialiser problem

jackjansen commented 4 months ago

Tried with a built player (Mac->Mac). Again doesn't make a lot of difference: reported 350ms, measured 430ms.

So there is a lower discrepancy between reported and measured, 80ms, but it's not a difference to write home about.

jackjansen commented 4 months ago

Tried with a built player on Windows too (beelzebub).

Reported latency 135ms, measured latency 400ms. Discrepancy 165ms.

jackjansen commented 4 months ago

Tried a different way of measuring: don't do an audio-only capture but record a video. On the voice receiver machine select the VoicePipelineOther so we can see its VU-meter in the inspector.

Load the video into Logic, so we can examine the waveform and the video with the VU meter at the same time.

The VU meter is pretty much frame accurate...

So something in Unity knows when it is playing out the audio......

jackjansen commented 4 months ago

Added a "DIY VU meter", by computing the RMS in the buffer. Recorded it with a slo-mo camera. Sometimes it gives the similar readings to the Unity VU meter, sometimes not?

jackjansen commented 4 months ago

The DIY VU meter improved (sort-of) by using GetSpectrumData. Now it gives the same readings as the Unity VU meter.

jackjansen commented 4 months ago

But accidentally found something very interesting, after a session was running for a long time: even with both input queue sizes set to 2 packets the reported (and perceived latency go up and up and up......

Seems like packets re being held up somewhere in the network or in AsyncTCPReceiver?

jackjansen commented 4 months ago

That was a red herring. At least, it wasn't related to what we're looking for. The queue lengths were long, and the first queue was non-dropping. That may be something we want to address, but it's not related to the extra 100-200ms latency we perceive.

jackjansen commented 4 months ago

Decided to go back to Unity 2021, just to see whether maybe this problem has been present for a long time but we just didn't notice it.

First try was branch ancient/deplyment/pilot0, but this doesn't work anymore: it requires the scenario storage at the orchestrator, which isn't there anymore.

jackjansen commented 4 months ago

Tried with newly created branch ancient/deployment/unity2021, which is the situation of mid-february, just before the switch to the new orchestrator and Unity 2022.

This branch has exactly the same problem as the current situation: the perceived audio latency is about 200ms larger than the reported audio latency.

I'm now starting to think that we never actually measured audio latency, and alway just looked at the reported number.

jackjansen commented 4 months ago

@ireneviola also isn't sure we ever really measured this.

jackjansen commented 4 months ago

Ok. We know where the problem is (on the Unity playout side), and we know how to work around it (by adjusting the audio playout clock to be slightly ahead of where we actually want it).

But I'm not going to fix it now. All the groundwork is in place, and the structure of the voice pipeline and P_Player and P_Player_self are ready for it, but I'm going to merge now and fix later.

jackjansen commented 3 months ago

Fast-forwarded branch to current state of master (v1.0.2).

jackjansen commented 2 months ago

Merging this branch into master, so @ashutosh3308 can use it for his VQEG experiments. Not losing the issue yet because it isn't resolved fully.

cwi-dis / VR2Gather

Voice audio issues with 2023.3 and new orchestrator #144