QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
529 stars 46 forks source link

Audio/video sync not as good as it should be? #1322

Open qubesuser opened 8 years ago

qubesuser commented 8 years ago

It seems that audio/video sync in Qubes is not quite perfect, and worse than native. Looking at A/V sync tests it seems the audio might be 50-100 ms delayed on my machine.

I have only tested this subjectively (it seems a proper test would require special hardware that I don't have at least right now) so it's entirely possible it's actually fine, although looking at the code there seems to be ground to expect that there is a problem.

EDIT: Latency introduced by Xen

I didn't realize this when I filed this issue, but the biggest issue is in fact that Xen is causing PulseAudio to have lots of latency compared to Xen-less Linux due to lack of suitable real time behavior, even with only dom0 running.

Latency/timestamps issue

One possible issue is that it seems that the latency of the host sound card (as apparently seen by running "pacmd list-sinks" in dom0) is not passed to the VM, and the VM does not pass timing information to dom0.

In contrast, PulseAudio sink module-esound-sink asks the server to provide a latency number and feeds it back to PulseAudio, while sinks like module-rtp-sink send timestamps over the network.

So it looks like the protocol should be changed to either send the latency from dom0 to VM (ideally every time it changes), have the VM send timing information and using it in dom0 to sync the audio or preferably both.

The idea is that if dom0 reports 200ms latency, a movie player can send audio with a +200ms timestamp, sleep 200ms, and then start showing video, and it should be synced with the audio (the media player and PulseAudio should handle this automatically, provided that timestamps and latency data are passed properly between the VM and dom0 PulseAudio instances).

I'm not sure what the exact impact of this is, although on my machine the internal sound card apparently has 125ms latency, which would be a significant issue if not handled properly.

Buffer size issue

It seems the low latency option selects a 4096 buffer size, but 44100 s16 2-channel audio like Qubes is using is 176400 B/s, so 4096 bytes are 1/43 of a second, which is lower than the 60 Hz of typical video output, but higher than 25-30 fps of typical movies.

So it might make sense to reduce to 2048 or even 1024, although ultimately the buffer size only matters for interactive workloads like games, while movies can just buffer a lot in advance (assuming the system is properly implemented).

I think this should not be a significant issue, but it's a trivial fix so it might be worth considering.

VSync not exposed issue

The GUI daemon protocol doesn't seem to tell VMs when vsyncs happen or allow to wait for host vsync, which means that the video will have a random 0-16 ms delay that the VM has no way of knowing, even if gui-daemon displays it instantly.

I think this should also not have very significant impact and it's probably not trivial to fix.

Anonymity issues

Sending latency and allowing to wait for vsync allows for inter-VM correlation, but there are already lots of other ways to do so.

It might still be a good idea to allow to disable these for anonymity-sensitive VMs.

marmarek commented 8 years ago

The "audio_low_latency" should be a solution here. Will be default in R3.1 (#1036). This forces dom0 pulseaudio to have much smaller buffer (4kb), which means much more predictable latency. Also pulseaudio module in VM (module-vchan-sink) should report latency estimated on this plus vchan buffer size(*). Theoretically having smaller buffer should mean higher CPU usage (smaller chunks, more wakeups etc), but in practice it isn't that bad.

Yes, we could make the buffer even smaller, but this would mean that it would be even more vulnerable to overruns if the VM not send the data on time. Also keep in mind that it isn't the only buffer - there is also vchan ring buffer in the path (2kb), which also may affect latency.

We have specifically simple protocol (no metadata at all, just raw samples) to reduce parsing in dom0 (and also handling possible corner cases). Introducing some negotiation here would complicate things.

(*) Looking at pactl list sinks this seems to be broken, since I've got "Latency: 0 usec, configured 46439 usec".

So the summary:

  1. Try "audio_low_latency" option. In most cases it is just enough (this seems to be confirmed by users).
  2. If not, latency reporting by VM module should be fixed. Do you want to look at it?
  3. If that's still not enough, we may eventually think of something else. But the priority is to keep the protocol as simple as possible, especially in VM->dom0 direction.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

I was using the audio_low_latency option and it seems subjectively not perfect (in the sense of "slightly off", not "seriously broken").

It would probably be required to do some objective tests, for example by sending both video and audio to the same HDMI cable, capturing via a capture card and measuring sync.

qubesuser commented 8 years ago

I wrote the dom0 part that writes the latency to QubesDB, working on the VM part.

marmarek commented 8 years ago

On Mon, Oct 12, 2015 at 12:42:27PM -0700, qubesuser wrote:

I wrote the dom0 part that writes the latency to QubesDB, working on the VM part.

Is that latency constant (or "almost" constant)? If so, it indeed may work.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

Mostly, although I'm no longer too sure that using qubesdb is a good idea, might be better to just use a vchan or use shared memory if it's easy.

marmarek commented 8 years ago

Looping in @adrelanos. What do you think in context of Whonix? This gives VM some information about dom0 sound card state. We're already limiting the value there (take a look at previous comments). Not sure how unique it would be. Maybe it should be rounded to for example 50ms?

Regarding implementation - if static value, QubesDB should be ok. If dynamic (but rarely changing), QubesDB still can be used (take a look at watches). But frequently changing value surely not good for QubesDB. And I guess not something we want to deal with at all...

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

There is, in fact, a bigger issue than all those listed in the first message: why is latency so high in dom0 in the first place?

It was 136ms minimum on my motherboard audio card when first checked. When suspending and unsuspending (with "pacmd suspend-sink 0 1; pacmd suspend-sink 0 0") it restarts at 16ms, which is the value it should be at, and then it gets gradually increased.

It seems pulseaudio increases latency on underruns, which means that Qubes is failing to schedule PulseAudio at least every 16ms, and this should be fixed.

I suspect the issue is simply that the Xen scheduler is not configured in a way that provides the necessary realtime behavior for dom0.

Maybe the right approach is to just make sure then PulseAudio latency is always < 16ms (60 Hz) for internal audio and forget all about exposing latency etc.

marmarek commented 8 years ago

Are you sure about the reason here? On my system I see configured latency: 186.00 ms; range is 186.00 .. 2000.00 ms

Which matches what I remember about implementing audio low latency mode. Also I don't think getting audio in chunks of 16ms, through the whole stack application->pulseaudio->vchan->pulseaudio->device, would be a good idea - for performance reasons. Even when possible. I think the way to go, is to keep latency around 100ms and properly report it to the application.

Or maybe redefine "audio low latency" mode so that the "normal" one would be around 100-200ms (properly reported), which should be just fine for playing music/videos. And have "low" latency mode around 50ms (for VoIP and stuff like that). But still, IMHO not lower.

Another issue is that indeed when VM cork all the streams, it isn't reported to dom0 and is seen as an underrun. And according to pacat logs, it shouldn't be that bad, device is automatically suspended after some timeout. I think it is irrelevant here.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

Yeah, the problem is that 186ms is absurdly high. It means that assuming a 60 Hz monitor, audio is played 11 video frames after being sent...

So if you play a video game and click to shoot a weapon, it will go "bang" 11 video frames too late.

This probably happens for input as well, which is even more disastrous, and means that it would be impossible to, say, play a competitive first person shooter on Qubes even if GPU passthrough actually worked.

Thing is, if you can play video smoothly at 60 Hz, there is no reason why audio should have more than 1/60 = 16 ms latency when using the motherboard or GPU HDMI output.

I tried to do this:

Now pacmd list-sinks reports around 50ms instead of 135ms which is still bad but better.

What I'd really like to do is tell Xen to apply realtime scheduling to dom0 (i.e. if a dom0 vcpu is not halted by the OS, schedule it always and immediately), but it looks like the Xen developers failed to implement this basic functionality.

There seems to be a patch at http://lists.xen.org/archives/html/xen-devel/2009-03/msg01352.html and it might make sense to consider applying it (assuming it actually improves the issue).

There's also the RT-Xen project, but that seems far more complicated.

EDIT: actually Xen 4.5 seems to have merged the RT-Xen scheduler "RTDS", although it seems that the Xen developers have again failed to support using multiple schedulers, so you can't use RTDS for the tasks with deadlines and the credit scheduler for normal ones...

There's also the "credit2" scheduler which is default in Xen 4.5 and according to them has better behavior for latency-sensitive workloads (but still no realtime...).

marmarek commented 8 years ago

If you want force pulseaudio to reduce configured latency, try playing with buffer attributes: https://github.com/QubesOS/qubes-gui-daemon/blob/master/pulse/pacat-simple-vchan.c#L79-L85

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

I think the latency problem is unrelated to pacat-simple-vchan.

The "configured latency: [...] range is X" X value seems to be the lowest latency that PulseAudio can provide on the system.

PulseAudio in ALSA "tsched" mode works essentially by scheduling a timer for a time proportional to the latency; if the timer fires too late causing an underrun, then PulseAudio increases the latency and timer, and the exposed latency is apparently never decreased unless you suspend and unsuspend the sink.

So I think the problem is quite simply that a realtime thread in dom0 is at least some times getting its high resolution timers firing up to around 150ms (in standard configuration) or 50ms late (with my current realtime tweaks).

Since PulseAudio wants to avoid sound glitches due to underruns, it then sets latency to an high value.

A possible band-aid is to fix PulseAudio to actually lower the latency when decreasing the watermark or disabling ALSA tsched and setting a low buffer size, but that is going to lead to underruns.

The proper fix is to just configure the system so that a realtime thread in dom0 gets its high resolution timers to always fire at most a few milliseconds late.

qubesuser commented 8 years ago

So, I tried running dom0 Linux without Xen (using chmod +x /etc/grub.d/10_linux and then grub-mkconfig), and this is the result with pacmd list-sinks:

        configured latency: 4.00 ms; range is *4.00* .. 1837.50 ms

(you need to play sound to get PulseAudio to adjust, which can be done by installing paplay in dom0 and run it in a loop with --latency=1 on some music in /usr/share/sounds)

Then I tried running with Xen and only dom0 running with all the vcpus pinned and the Xen sched-credit timeslice set to 1ms:

        configured latency: 36.00 ms; range is *36.00* .. 1837.50 ms

So just by running Xen, latency jumps from a perfectly fine 4ms (which allows perfect A/V sync even on a 120 Hz monitor) to an unacceptable 36ms (can't even sync at 30fps).

Then, actually running VMs seems to move latency to around 50ms with a tweaked scheduler configurations and 150ms without a tweaked scheduler.

So I think the most important thing is getting rid of this 30ms that Xen is causing for no reason.

The best thing to do right now would seem to be checking if the problem is still present with Xen 4.6 (which would also allow to test the RTDS scheduler for latency introduced by running VMs).

How hard is it to upgrade Xen to 4.6 with current Qubes? Has any work already been done on that?

adrelanos commented 8 years ago

Looping in @adrelanos. What do you think in context of Whonix? This gives VM some information about dom0 sound card state. We're already limiting the value there (take a look at previous comments). Not sure how unique it would be. Maybe it should be rounded to for example 50ms?

It requires local access and cannot be measured remotely such as by viewing a video in a browser?

qubesuser commented 8 years ago

I didn't realize it at first, but unfortunately it might be possible to measure it remotely depending on the exact API exposed, which is why getting rid of the latency in the first place actually seems much better than attempting to get the VM to compensate for it.

In particular, having X ms latency means you need to wait X ms to start playing video to have it synced, which can be measured with an API that can extract the image of the current frame, and possibly with API that gives the playback position if that position includes latency.

Even without any API it is always possible to measure it by testing the user's reaction time to an audio cue, but that's much harder to setup and requires the user to cooperate for a long time so that the adversary can get enough information to average out natural variation in the reaction time.

qubesuser commented 8 years ago

Anyway, I started porting Xen 4.6, so assuming it will actually boot it should be possible to tell if the latency is fixed, otherwise I guess the next step is asking on the Xen ML why Xen is causing 30ms latency with only dom0 running.

adrelanos commented 8 years ago

If it is remotely measurable, I guess it is something that the Tor Browser Bundle devs need to know. Could you check they know and report an issue as appropriate against The Tor Project's tracker please?

marmarek commented 8 years ago

Generally about the whole audio/video sync issue: We have a lot of higher priority work, so we will not invest a lot of time in this issue. If you want to work on it, feel free to do that. I would be happy to accept the patches. But we don't want to make major architecture changes just because of this issue. For example "just" upgrading Xen version is probably ok. But applying some custom patches for hypervisor, is not.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

I discovered that the dom0-only latency problem is due to having OpenGL apps with wait-for-sync enabled AND running under Xen (either 4.4 or 4.6), at least when using a Radeon GPU.

Without Xen, OpenGL apps with vsync don't cause latency, and latency is also OK when running with Xen but with no such OpenGL apps.

Obviously the kwin compositor qualifies as an OpenGL app with vsync enabled when configured to wait for vsync, and that's what was causing the problem.

Unbinding snd-hda-intel from the Radeon HDMI audio device makes no difference.

Why this happens is currently a mystery.

Should probably test a newer Linux kernel and maybe newer Xorg/libdrm/Mesa to see if the problem is still present.

marmarek commented 8 years ago

I don't see what kwin have to do with pulseaudio/snd-hda-intel. It is no different on Xfce with composition disabled.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

qubesuser commented 8 years ago

Well, experiment shows that this is happening with the radeon driver, i.e. disabling vsync or desktop effects lowers latency from 26ms+ to 4/8ms...

Waiting for vblank is a 16ms wait (at 60 Hz), so this mostly matches with the numbers.

The first idea to explain this seems to me that while something is waiting for vsync then either timers are somehow not fired, or there is some lock that is being held that PulseAudio also needs, but only when under Xen.

Another possible idea is that since the GPU sends IRQs only when vblank is enabled, it's the GPU IRQs that somehow screw things up under Xen.

None of these seem particularly plausible, so I just don't know...

I wonder if it happens with Intel GPU too.

Could try to do more low level testing using an hand-written tool using hrtimers and something that directly interacts with EGL or DRM.

qubesuser commented 8 years ago

BTW, here is how to test:

  1. Install paplay and pacmd in dom0 if they are not already installed with "qubes-dom0-update pulseaudio-utils"
  2. If testing without Xen, run "chmod +x /etc/grub.d/10_linux; grub2-mkconfig -o /boot/grub2/grub.cfg" to allow booting dom0 without Xen, reboot and don't choose Xen in grub (NOTE: PCI devices won't be hidden from dom0 this way)
  3. Preferably set "realtime-scheduling = yes" in /etc/pulse/daemon.conf
  4. Shut down all VMs with "qvm-shutdown --all --wait"
  5. Enable/disable desktop effects and vsync in "System Settings"
  6. Kill and cause a restart of pulseaudio with "killall pulseaudio"
  7. Run "while :; do paplay -v --latency=1 /usr/share/sounds/freedesktop/stereo/alarm-clock-elapsed.oga ; done" and let it run for a minute or so since PulseAudio only calibrates latency when it is playing audio
  8. Use "pacmd list-sinks|grep 'configured latency'" to see the latency
qubesuser commented 8 years ago

So the dom0-only latency does in fact seem to just be caused by using OpenGL in Xen (with the radeon driver), regardless of vblank. OpenGL without Xen is fine.

This might be the fault of the Linux radeon drivers, the Linux DRM layer or of Xen. It would be nice to test with a newer kernel and with an Intel GPU, not sure if I'll be able to get around to that soon.

andrewdavidwong commented 8 years ago

It would be nice to test with a newer kernel and with an Intel GPU, not sure if I'll be able to get around to that soon.

@qubesuser: Do you still plan to do this eventually? Not trying to rush you; just wondering if I should assign this feature to you for tracking purposes.

marmarek commented 8 years ago

It worth checking impact of #1549 - maybe now it is good enough?

adrelanos commented 8 years ago

Is this related to my hdmi audio desync issue in #1698?

marmarek commented 8 years ago

Is this related to my hdmi audio desync issue in #1698?

Probably not, but worth checking.

andrewdavidwong commented 5 years ago

It worth checking impact of #1549 - maybe now it is good enough?

It looks like there haven't been any further reports on this issue in three years, so it's likely now good enough. We can always reopen this if further problems are discovered.

mattmccutchen commented 3 years ago

With Fedora 32 under Qubes 4.0 on my Lenovo ThinkPad L430, and without changing the audio_low_latency setting (so I assume it is on by default), I am still getting about 125 ms of audio latency. Even 25 ms skew between audio and video is a problem for stepping to Dance Dance Revolution videos on YouTube. :smile:

I made a tiny change to module-vchan-sink that allows the latency offset in pavucontrol in an AppVM to work (with compatible applications, which appear to include YouTube playback in Firefox): QubesOS/qubes-gui-agent-linux#107. The user still has to specify the latency offset manually. I've also used this setting to compensate for audio/video skew in the original video; while I know it isn't intended for that, it works fine as long as there aren't two audio sources in the same AppVM at the same time that require different latency offsets.

justusranvier commented 3 years ago

I'm not sure if this should be another issue or not, but I was experiencing horrible audio latency of approximately 2000 ms which made qubes completely unusable for voip and extremely painful for watching videos.

Eventually I narrowed it down to the use of the "simultaneous output" audio device.

If I have a vm's audio sent to a normal output device everything works as expected, however if I send a vm's audio output to the "simultaneous output" virtual device then the sound will be delayed by almost exactly 2 seconds.