The scream-ivshmem-pulse receiver asks pulseaudio for a very high latency

CCF100 commented 5 years ago

Pulseaudio output I'm trying to use FL Studio in a VM but Scream's latency is absurdly high... The alsa Receiver doesn't have this issue, but I don't want to use it as pulse is so much more convenient...

martinellimarco commented 5 years ago

Thank you for reporting this, I will try to reproduce your issue in a couple of days. I wish I could do it earlier but I'm busy with my work. In the meantime can you share more informations with me? How did you generate that log file? What settings are you using in your Scream setup?

martinellimarco commented 5 years ago

Hi @CCF100 can you test this experimental low latency version? https://gist.github.com/martinellimarco/a5567d007479e40b329ed9e19fe28a4e

You can tune the latency changing the multiplier in line 221, default is *2.

A chunk size is always equal to 20ms because that's what I defined in the windows driver, probably we'll need to change that to get even lower latency.

CCF100 commented 5 years ago

Ok, @martinellimarco I'll try it!

The latency is much better now, but pulseaudio wants to "rewind" it and I had to restart the receiver once due to latency... I didn't bother to change the line you suggested, however...

aqxa1 commented 5 years ago

I tried the low latency version and it's a decent improvement, but it degrades quite easily. I suspect evdev passthrough is making it worse because it tends to cause underruns when switching between host and guest. In fact, if you use the Pulseaudio receiver and scream-ivshmem-pulse-transmitter in a Linux guest, you get both Pulseaudio instances repeatedly replaying with every host/guest keyboard switch, degrading the latency more and more on both host and guest.

martinellimarco commented 5 years ago

Hi @aqxa1 , thank you for your feedback. The linux transmitter is really just a hack I've spent a few hours on and never looked at it again so I'm not sure it's the most reliable test. Have you tried changing the multiplier in line 221 of the gist ? Increasing it should increase the latency and stabilize it. What do you mean with "repeatedly replaying" ? Do you hear the same sample in loop? Does this happen only with the linux transmitter or with the windows one too ?

aqxa1 commented 5 years ago

@martinellimarco Honestly, it's probably not the fault of something in your code, but rather a design issue with pulseaudio. Basically, when pulseaudio detects an underrun, it adjusts the latency higher (referred to as "~~replaying~~ rewinding"). Since switching between host and guest with evdev causes an underrun, the pulseaudio in the guest "rewinds", as does the pulseaudio on the host.

Pulseaudio doesn't try to correct this either (i.e. try for a lower latency again) so it just keeps getting worse and worse. The non-timer based scheduling (tsched=0) is supposed to workaround this behaviour, but I'm not sure it works with null cards (or if it does, how to configure it).

Probably, an ALSA or JACK transmitter would work better for this use case since it uses static latency, and both are generally lower latency than pulseaudio, even in the best case.

EDIT: I might try JACK's network support which should presumably work well

EDIT 2: It's better with Windows, and yeah, raising the multiplier helps settle the latency and doesn't jump to 200ms. So you're right, it's probably just the Pulseaudio transmitter at fault.

martinellimarco commented 5 years ago

Thanks for testing this. You seems to know more than me about pulseaudio internals. Do you have any reference I can look at that explain the things you were referring to? I'm interested in lowering the latency but unfortunately I don't have much time at the moment for an in deep analysis.

aqxa1 commented 5 years ago

It's honestly just stuff I've picked up here and there, I don't have a particularly deep understanding of it. Pulseaudio doesn't seem to be very well documented either, unfortunately.

The LatencyControl page might be helpful.

And a Glitch-free audio blogpost by the main developer of Pulseaudio, which talks about timer-based scheduling and how it differs from the traditional approach.

Also, I used replaying, when I meant rewinding in an earlier comment. Here's a post about rewinding. It looks to be about the output/sink side of things, but the author suggests that's it probably the same as the behaviour with sources.

justinkb commented 4 years ago

What's the state of this issue? Switched from alsa to pulse recently, but my Windows guest that I migrated from scream alsa to scream pulse receiver is now unusable. Starts ok, but latency gets progressively worse as time goes on, which I gather from aqxa1's comments would be unavoidable?

martinellimarco commented 4 years ago

What do you mean when you say that latency gets progressively worse? Is it like if the audio track lags behind what you can see? If so I've noticed this sometimes on my new system too and I plan to work on it during Christmas.

On the system I used to develop this driver and receiver I didn't get this problem and I could keep the audio playing for hours without any accumulated delay. On a new system I build it's different for some reasons and if I play a movie for example it starts aligned but at the end there is half second of delay. If I stop any audio source then start another the delay is gone, same if I restart the receiver. Is this the same thing for you?

That said I'm not convinced this is the same latency problem described in this issue, we'll see.

alegru commented 4 years ago

Hi @martinellimarco, just wanted to chime in and share that I also noticed delay after a long time of continuous audio. I can help testing when you work on it eventually. I tried to renice the receiver to the same value as pulseaudio, but that didn't seem to help. Apart from that, I'm now using scream-ivshmem-pulse since April, and it serves me well :)

CCF100 commented 4 years ago

I've noticed FL Studio running in Wine has amazing latency, it feels even faster than running it on Windows... I'll generate some logs of pulseaudio the next time I use it...

JaneSmith commented 4 years ago

I'm also experiencing horrendous latency with scream-ivshmem-pulse, although I haven't tried the Alsa one. I'm not entirely sure what causes it, whether it's just bad out of the box, whether it's evdev passthrough toggles via Ctrl+Ctrl worsening it, or whether it gets worse when I alter my host's sound output device (e.g. switching back and forth between speakers and headphones). Whatever the case, it is really truly bad. I get a noticeable latency of several seconds which makes it completely unusable for gaming.

martinellimarco commented 4 years ago

Hi everyone, it's been a while. I didn't had much time to work on this project in the past months but I've tested on more configurations and I tried to replicate most issues some of you are experiencing. Unfortunately I've only 2 PCs capable of running a VM for this and in most cases it just works fine.

Nonetheless I've been able to track some patterns where things goes wrong and I've realized some of my initial assumptions are not that good.

Now, I have a few ideas to work with but in the meantime I'd like your help to collect even more samples.

Can you guys screen record a minute or two when you are experiencing issues? I'm pretty sure there are at least 2 different issues reported here as "high latency". I want to make sure there aren't more.

Thank you.

martinellimarco commented 4 years ago

@Pagten sent a pull request that allows to set the latency client side on the networked version of pulseaudio receiver.

I've applied the same patch to pulseaudio-ivshmem. It's similar to the low latency patch I've posted in august but it handle a few things better and I strongly encourage you to try this.

On my system I can run with -t 7 (7ms target latency) without hickups. At -t 6 I can hear some occasional click and pop.

It would be interested to see how low you can go and if this does solve the issue for some of you.

darkstego commented 4 years ago

I still have the issue of degrading latency over time. I am running the latest git version using IVSHMEM with -t 10. The latency is good at the start, but after a while starts to deteriorate until the latecncy is over a second.

This is a windows 10 guest with a linux host.

martinellimarco commented 4 years ago

I finally had enough time to sit down and dig into the root of this second problem. I've modified the driver and the receiver adding high precision timers to both of them to calculate a delta between the time passing in windows and linux.

The result is shown in this image: time_drift

The y axis is in milliseconds, the x axis the number of chunks, each one consisting of 20ms of audio.

What is shows is that the linux and windows timers drift apart of ~5ms over a time of ~12 minutes.

Tomorrow I'll test on another system where the problem is more audible, I'm sure I'll find a greater difference.

Looking around it seems to me that QEMU time drift is a known problem with a few mitigations that you all can try.

Anyway now that we know the nature of the problem we can think of a solution.

Also I'm not quite sure if this can be related but for comparison this is my QEMU XML where the clock is defined.

  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>

Do any of you have HPET enabled by any chance?

darkstego commented 4 years ago

I have the exact same clock section on my system and I do get audio drift when using IVSHMEM and ALSA receiver, also true of IVSHMEM and Pulse reciever, but the IVSHMEM-ALSA setup gives the lowest initial latency from my testing).

I did try to set "hpet" to "yes" and latency drift still occured.

duncanthrax commented 4 years ago

I'm not a QEMU/KVM user, but I think you need to look at the "tickpolicy" option on the "timer" setting of the "clock" section. I guess "tickpolicy" should be set to "catchup".

Also, according to Stackoverflow, on the Windows side: bcdedit /set useplatformclock

martinellimarco commented 4 years ago

HPET should be set to "no". I've asked if anyone had it set to "yes" to see if that was worsening the problem.

I wonder at what rate the time drift on other systems, for example, if you play a movie uninterruptedly for one hour what's the perceived latency at the end?

If you stop any audio source in the VM and start another does the latency "reset"?

martinellimarco commented 4 years ago

@duncanthrax I'll try with bcdedit /set useplatformclock and see if I get any difference.

darkstego commented 4 years ago

Drift on my system is .5 second per hour.

Latency does reset if I stop all audio sources (even those not using scream) and then start up audio again.

martinellimarco commented 4 years ago

Ok so your timer is drifting ~20 times faster than mine. No wonders it sounds horrible.

The audio goes back in sync when nothing is playing because at that moment Scream doesn't send anything and the receiver can consume the excess samples.

In practice what is happening is that windows should produce a fixed amount of samples per seconds, let's say 44100, and linux should consume the same amount per second.

In practice here we see that windows is producing just a bit more, in your case half a second per hour it's 22050 more samples in an hour or an average of 6.12 more samples per seconds.

I never experienced this but I suppose it's also possible the opposite, where windows produce just a bit less, this could lead to audible clicks and pops.

I'll do more experiments in a few hours.

darkstego commented 4 years ago

So I am guessing the qemu and network audio don't drift because they eventually hit a buffer overrun and samples are lost but the host doesn't ever falls too far behind.

With IVSHMEM the buffer is really big, even at 1MB for 44.1 KHz 16bit 2 Channel that would be a 6 seconds buffer size if my napkin math is correct.

martinellimarco commented 4 years ago

I've tested with the same clock settings but with bcdedit /set useplatformclock true. After a reboot it seems it's not drifting anymore! :)

time_drift_2

It's a rather short test but encouraging. I'll leave my pc on this night with something in loop to log a few hours of data.

@darkstego Your math is correct. The buffer is so big to account for the worst case scenario of 192KHz 32bit 8 channels. 1MB becomes ~160ms of buffer in that case.

Since in your system the problem is more noticeable can you try from an admin cmd to use bcdedit /set useplatformclock true and see if it does work for you too after a reboot?

darkstego commented 4 years ago

I did enable it and 30 minutes in I noticed it start to drift still.

But I am interested to know how this is supposed to work. useplatformclock enables HPET in Windows as I understand it. But hpet is disabled.

I will play around some more with the timers and see if I can find a setting that eliminates drift.

martinellimarco commented 4 years ago

In the documentation of bcdedit it says nothing about HPET.

useplatformclock [ yes | no ]
Forces the use of the platform clock as the system's performance counter.

I've seen a lot of posts that indicate it to be related to HPET but I've also found others that say that it does use TSC.

I've also found this reddit thread and I've tried the proposed settings (<feature policy='require' name='invtsc'/>)

The drift for me is still 0 after 2 hours but I must say that as described in that thread the system feels more responsive when playing a game. I get the same FPS but the input latency feels better. I don't know, maybe it's just that I'm paying attention to details.

darkstego commented 4 years ago

I already use invtsc. I have been reading a lot about timers and will try a bunch of configurations to see if one work. The problem is each test of a configuration takes at least 30 minutes to see if a noticeable drift occurs. Wish there was a way to quickly check for drift that doesn't require actually waiting for it to occur.

martinellimarco commented 4 years ago

Yeah, I understand this very well, this is why it took me so long to find out the root of the problem. I was suspecting it but testing everything is a long process.

Anyway, so far 7 hours and no drift here.

darkstego commented 4 years ago

@martinellimarco What does the <hyperv> section of your libvirt xml look like? I wanted to check if there is anything there that effects the hypervclock timers.

So far a day of testing hasn't resulted in any configuration that actually solves the drift. Timer drift seems to be an issue with a lot of VM setups and many just use NTP to mask its effect on clocks. I am wondering if some hardware configurations handle the timers differently than others.

martinellimarco commented 4 years ago

This is the whole XML, for reference. I never bothered tuning it much, it's probably a mess :)

<domain type="kvm">
  <name>win10-q35</name>
  <uuid>1165888c-076b-41b0-96ff-b21a99abc31e</uuid>
  <description>Windows 10 VM - Q35 chipset</description>
  <memory unit="KiB">8388608</memory>
  <currentMemory unit="KiB">8388608</currentMemory>
  <vcpu placement="static">6</vcpu>
  <iothreads>4</iothreads>
  <os>
    <type arch="x86_64" machine="pc-q35-2.12">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="whatever"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="host-model" check="partial">
    <topology sockets="1" cores="3" threads="2"/>
    <feature policy="require" name="vmx"/>
    <feature policy="require" name="invtsc"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" present="no" tickpolicy="catchup"/>
    <timer name="pit" present="no" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="kvmclock" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/win10.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0a" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </controller>
    <controller type="usb" index="0" model="ich9-ehci1">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x7"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci1">
      <master startport="0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x0" multifunction="on"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci2">
      <master startport="2"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x1"/>
    </controller>
    <controller type="usb" index="0" model="ich9-uhci3">
      <master startport="4"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0c" function="0x2"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <interface type="direct" trustGuestRxFilters="yes">
      <mac address=""/>
      <source dev="eno1" mode="bridge"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0b" function="0x0"/>
    </interface>
    <interface type="network" trustGuestRxFilters="yes">
      <mac address=""/>
      <source network="private"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/>
    </interface>
    <channel type="unix">
      <target type="virtio" name="org.qemu.guest_agent.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
    </graphics>
    <video>
      <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0" multifunction="on"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
    </hostdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0"/>
    </memballoon>
    <shmem name="looking-glass">
      <model type="ivshmem-plain"/>
      <size unit="M">64</size>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x10" function="0x0"/>
    </shmem>
    <shmem name="scream-ivshmem">
      <model type="ivshmem-plain"/>
      <size unit="M">1</size>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x11" function="0x0"/>
    </shmem>
  </devices>
</domain>

My PC uses an i7-2600 on a DQ67OW motherboard, it was built in 2011. I'm sure a lot has changed in 9 years.

I think that trying to avoid the drift in the timer is not a practical solution for many. I'm probably being very lucky here.

Probably a more robust solution is to detect the delay and adjust the playback rate. I'll see what I can do about that.

alegru commented 4 years ago

One "trick" to get drift almost immediately is for me to lock my session using physlock while keeping audio playback. Unlocking ten minutes later, the audio is easily a second behind.

That aside, I tried changing some settings. First of all, I'm not using libvirt, but raw QEMU. I had set the setting to

-rtc base=localtime,clock=host,driftfix=none
-global kvm-pit.lost_tick_policy=discard

which gave me noticeable audio drift after some time. I had already set

-cpu host,+invtsc
-no-hpet

before though.

Now, I'm testing with

-rtc base=localtime,clock=host,driftfix=slew
-global kvm-pit.lost_tick_policy=delay

which should translate to your libvirt settings. And I also set the bcdedit thing. I can see in LatencyMon that the interrupt to process latency is a bit worse than before, due to QEMU injecting interrupts to fix the time drift. Audio drift seems to be better than before, from my initial testing (at least if I'm not locking the screen).

martinellimarco commented 4 years ago

I will try physlock, thanks! I was searching for a way to "amplify" the problem to study it better.

duncanthrax commented 4 years ago

Probably a more robust solution is to detect the delay and adjust the playback rate. I'll see what I can do about that.

Indeed the best solution. This could be relatively easy on the receiver by watching buffer size change over time, e.g. pulse has a mechanism for that. See https://freedesktop.org/software/pulseaudio/doxygen/streams.html, section "Buffer Attributes". If buffer increases, remove n samples per second from the stream. If it decreases (underrun), insert duplicate samples into the stream.

martinellimarco commented 4 years ago

physlock does work, even locking the session for a few seconds set the audio out of sync of a second.

The timer doesn't drift in this case, the receiver still sends the output to pulseaudio that bufferize it and when the session is unlocked we get the delay, but it's very useful for testing :)

I've also tested with alsa and it's the same thing.

I've also tested mplayer, VLC and youtube on chrome to see how they react with physlock.

The problem is the same but all 3 recovers in a fraction of a seconds, in different ways.

Mplayer rewind the video to align to the audio, VLC skip the audio to align to the video and chrome pause the video until the audio is back in sync.

Obviously we can't do anything with the video but we can do what VLC does.

I'm experimenting a bit with the receiver to see if I can fix this once and for all.

darkstego commented 4 years ago

I made some changes to the pulse receiver and was able to limit the drift by using the buffer maxlength attribute. I made a pull request with the changes. Hope it helps.

martinellimarco commented 4 years ago

Today I tested #90 and #91 together, in different combinations and I can't get the audio out of sync anymore. Tested without bcdedit useplatformclock.

Can someone else test them and report if everything is ok now?

In the meantime I'll work on alsa.

alegru commented 4 years ago

Great job everyone, I also removed the bcdedit change, and can't get audio out of sync anymore. I'm especially happy that physlock was useful for testing and won't cause drift any longer. I'm just wondering, is #90 actually needed now that scream resets the read index? Because setting it especially low causes more CPU load (around 14% for pulseaudio alone on 1ms). Setting target latency (#74) to 1ms has almost no effect on CPU load, but is an instant audible improvement. I'm using this video for comparisons. It might be confusing for end users what each latency setting does.

darkstego commented 4 years ago

@alegru Were you having increasing latency or was the latency the same regardless of how long the audio was playing?

It is interesting about CPU load, I will need to look into that. I wonder if #91 alone is enough to resolve the audio drift.

martinellimarco commented 4 years ago

I think #90 is still needed to limit the buffer in pulseaudio while #91 limits the one in scream.

physlock is useful for debugging because it allow us to fill the pulseaudio and scream buffer but the way it does it is not the same as when the timers drift.

I'm using the same video for testing and also the one from twitch :)

I agree that too many flags will confuse the users, that's the reason I didn't add another one to set the delta in #91.

darkstego commented 4 years ago

It is possible to stick to one latency flag, because adjusting the max latency automatically sets target latency and the prebuffer. If there were no issues with adjusting max latency then you can just have 1 number and have pulse take care of the rest. But since the CPU load might be an issue, then it might be better to keep the buffer limit to only those who are getting audio drift.

I will try to look into the CPU load today and see if I can replicate it.

alegru commented 4 years ago

@darkstego latency seemed to be the same when I used physlock to get some drift. #91 corrected the read index, and I couldn't hear a difference between setting -l 200 and -l 1

But I didn't test it over long time to confirm with natural clock drift. I think it'd be reasonable to link the setting to the 60 ms we get from limiting to 3 chunks. That seems to make more sense than linking it to the target latency the user might want to set manually.

I decided to share my short benchmark, all with the audio sync test:

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 50 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu (default)

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 30 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 50 -l 50 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 1 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 200 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 0-1% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 1 -l 200 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 1-2% cpu

pkill scream
(&> /dev/null scream -m /dev/shm/scream -o pulse -t 200 -l 1 &)
sudo renice -n -11 "$(pidof scream)" > /dev/null

-> pulse 13-14% cpu

=> audible difference between -t values => no audible difference between -l values => -t 1 -l 60 seems to be best tradeoff cpu load / drift?

darkstego commented 4 years ago

If -l is smaller than -t then pulse sets -t to be -l.

The issue is that the target latency -t only really works if the sample rate between the source and server are the same. In the case of clock drift this doesn't hold true.

In case of drift you latency will start at -t but will increase until it becomes -l and continue to stay at that latency. So in my case for example I only use the flag "-l 5" since the change in -t is implied and because of the drift my buffer will always be at max capacity so the latency is dictated by max buffer size.

alegru commented 4 years ago

@darkstego thanks for you explanation! Now I understand how it works a bit better. Well, CPU load seems to increase on values lower than 5, but since there's no need to set it so low I wouldn't worry about it too much. Setting it to 5 results in pulse using about 4% CPU on my machine, which is definitely acceptable.

martinellimarco commented 4 years ago

If someone is interested I've published here an experimental branch that use ivshmem-doorbell instead of ivshmem-plain.

Ivshmem-doorbell is shared memory with interrupt between host and guests and can be used to avoid polling.

I've started playing with it a few months ago but never finished for lack of time but now it has reached an usable state.

The driver is compiled but not signed so you'll need to enable testsigning code with bcdedit -set TESTSIGNING ON to install it.

You'll need a new receiver, it's based on the old standalone pulse receiver with some of the latency patches added. Read the instructions in there to setup the VM.

Is it worth it? I'm not sure. Yes, the CPU usage is 0% when there is no sound but while under load it's pretty much the same, at least for me.

Feedbacks are appreciated.

alegru commented 4 years ago

@martinellimarco tried it on raw QEMU, works great! The only thing I'd remove is the ,reconnect=1 option, I don't know why but it caused QEMU to hang. After I removed that everything worked as expected. Aside from zero CPU load when idle, it's also nice that there's no conflict with LookingGlass shared memory ids. And not needing to install the pci device for shared memory in Windows is a plus, too.

martinellimarco commented 4 years ago

Ops, sorry about the reconnect=1 problem, it's the only thing I didn't test :)

You still need the IVSHMEM driver in Windows and it can still conflict with LookingGlass.

alegru commented 4 years ago

Ah I figured it should be easier working with doorbell peerIDs, in ivshmem-plain you more or less only know about the size of the shared memory and have to detect your space. But I only took a brief look at the ivshmem specification. And you're right, the driver is still required of course. I tried deactivating the device in Windows and crashed :) That aside, I don't see more or less load when audio is playing, and LatencyMon seems the same, too.

martinellimarco commented 4 years ago

Yes, that's the same thing I see. I've decided to give it a try because many have said on the arch wiki and on the looking glass forum that the ivshmem-plain version of scream is to avoid since it is so much inferior in nature due to polling.

I throught that maybe I was beeing stupid and missing some major performance benefit.

I agree that in theory interrupt is better than polling and in idle it certainly is but under load I can't hear any audible difference and the performance difference is so negligible it's not worth the trouble of using the ivshmem-server.

Thank you for the feedback.

alegru commented 4 years ago

Would be interesting to hear if @canselcik sees audio latency improvements with the doorbell implementation.

duncanthrax / scream

The scream-ivshmem-pulse receiver asks pulseaudio for a very high latency #54