Real-world low-latency testing comparison - strange issue encountered with WASAPI Exclusive

dechamps / FlexASIO

A flexible universal ASIO driver that uses the PortAudio sound I/O library. Supports WASAPI (shared and exclusive), KS, DirectSound and MME.

Other

1.34k stars 72 forks source link

Real-world low-latency testing comparison - strange issue encountered with WASAPI Exclusive #153

Open danryu opened 2 years ago

danryu commented 2 years ago

This is an attempt to get reliable low-latency performance from FlexASIO, using generic laptop hardware, which will probably be the use case for most people needing to use a universal ASIO driver.

Hopefully this will be useful as a general comparison for different configurations. In particular, there is a strange issue with WASAPI Exclusive mode which I would like to address/resolve! Please see the test results below.

The tests cover:

FlexASIO WASAPI Shared
FlexASIO WASAPI Exclusive
FlexASIO WDM-KS
ASIO4ALL
Focusrite ASIO (in this case using Focusrite Scarlett 2i4 interface)

across a few representative configurations. I also attach a zip file with the audio from each of the tests, to illustrate my descriptions. asiotest.zip

Test setup:

Using generic Realtek High Definition Audio device (in Lenovo Thinkpad L15)
Using inbuilt mic, and headphones for output plugged into 3.5mm output jack
Use Ableton Live for recording audio, metronome click at 120bpm
Test by holding headphones with metronome output to inbuilt mic during test
After each test, force-reset ASIO settings by unsetting and setting configuration in Ableton
Use Ableton to show rough real-world latency measurements in the sample editor window

TL;DR:

WASAPI Shared makes for high-quality recordings, but latency not quite low enough
WASAPI Exclusive has some issues - negative latency ?? And mild glitches
- QUESTION: Is there a fix for these issues?
WDM-KS - practically impossible to get usable end results

1) FlexASIO WASAPI Shared

backend = "Windows WASAPI"
bufferSizeSamples = 32

[input]
device = "Microphone Array (Realtek(R) Audio)"
suggestedLatencySeconds = 0.0

[output]
device = "Realtek HD Audio 2nd output (Realtek(R) Audio)"
suggestedLatencySeconds = 0.0

End result:

15ms latency
VERY good audio, clear and full sound
NO pops or scratches
almost usable for low latency recording ….

2) FlexASIO WASAPI Exclusive - 512 Buffer, suggested 0.0s

backend = "Windows WASAPI"
bufferSizeSamples = 512

[input]
device = "Microphone Array (Realtek(R) Audio)"
suggestedLatencySeconds = 0.0
wasapiExclusiveMode = true
wasapiAutoConvert = false

[output]
device = "Realtek HD Audio 2nd output (Realtek(R) Audio)"
suggestedLatencySeconds = 0.0
wasapiExclusiveMode = true
wasapiAutoConvert = false

End result:

SOMEHOW has negative latency ??? ie sample occurs roughly 34ms BEFORE the metronome click - NO IDEA how this is happening!
VERY quiet
noticeable scratchy pops and glitches in sound - not usable for recording or playback

3) FlexASIO WASAPI Exclusive - 1024 buffer, Suggested 0.01

backend = "Windows WASAPI"
bufferSizeSamples = 1024

[input]
device = "Microphone Array (Realtek(R) Audio)"
suggestedLatencySeconds = 0.01
wasapiExclusiveMode = true
wasapiAutoConvert = false

[output]
device = "Realtek HD Audio 2nd output (Realtek(R) Audio)"
suggestedLatencySeconds = 0.01
wasapiExclusiveMode = true
wasapiAutoConvert = false

End result:

essentially same performance as Test 3, despite different settings

4) FlexASIO WDM-KS 512

backend = "Windows WDM-KS"
bufferSizeSamples = 512

[input]
device = "Microphone Array (Realtek HD Audio Mic input)"

[output]
device = "Headphones (Realtek HD Audio 2nd output)"

Note: FlexASIO will NOT initialize with suggestedLatencySecondsset

End result:

Latency is very high - approx 60ms
quiet
slightly glitchy /small pops, not usable for recording

5) FlexASIO WDM-KS 128 buffer

backend = "Windows WDM-KS"
bufferSizeSamples = 128

[input]
device = "Microphone Array (Realtek HD Audio Mic input)"

[output]
device = "Headphones (Realtek HD Audio 2nd output)"

End result:

Latency appears around 16ms
Audio is quiet / low-level and glitchy - unusable for recording

6) FlexASIO WDM-KS - buffer 64

backend = "Windows WDM-KS"
bufferSizeSamples = 64

[input]
device = "Microphone Array (Realtek HD Audio Mic input)"

[output]
device = "Headphones (Realtek HD Audio 2nd output)"

End result:

Latency appears around 10ms
quiet / low-level
very scratchy / with pops - not usable for recording

7) ASIO4ALL - 512

Settings:

Current version 2.15
Default settings
Buffer size 512

End result:

Latency appears around 1ms! ????
very quiet audio
no noticeable pops or scratches
usable for recording

8) Focusrite ASIO - 32

Interface: Scarlett 2i4 Mic: SM57

End result:

latency is around 1ms ???
clear, no pops or glitches
fine for recording

dechamps commented 2 years ago

SOMEHOW has negative latency ??? ie sample occurs roughly 34ms BEFORE the metronome click - NO IDEA how this is happening!

I don't know anything about Ableton, but I can hazard a guess. ASIO drivers can report their estimated latencies to the host application (either on request, or as a timestamp estimate on every buffer being passed back and forth). What I suspect is happening is that Ableton is using this data to automatically compensate for the latency announced by the driver. That is, it might be automatically shifting the displayed waveform to the time at which it thinks the audio signal was actually recorded. (A similar process might take place on the playback side, shifting the time at which Ableton will beat the metronome.)

If that is the case, then that would completely explain how you might observe seemingly negative latencies: the FlexASIO WASAPI backend might have an issue where it might be overestimating the latency in this case, leading to Ableton shifting the waveform too far to the left to compensate, and you end up with an audio event that seemingly occurred before the trigger. Causality was not violated - it's just the display that's misleading.

Still assuming that my hypothesis is correct, this would mean that in your experiments you were not measuring actual latency - you were instead measuring how well the driver-advertised latency matches the actual latency. Which, granted, is still somewhat useful, but presumably that's not what you were looking for...

This would also explain why your measurements appear perfect for ASIO4ALL and Focusrite: it could very well be that these drivers estimate their own latencies correctly, so it looks like they have near-zero latency in Ableton, but in reality that just means Ableton correctly realigned their waveforms to compensate for the actual latency.

Now, I'm not surprised that FlexASIO latency estimates might be off. The PortAudio code that computes them could probably use some love and it's probably wrong in many cases. It's a bit tricky to get these numbers right though, because in order to fix them we'd first need to measure what the actual latency looks like in a variety of scenarios, and that is made harder by the fact that we need to estimate the output latency independently from the input latency. A loopback experiment can be used to measure the total latency, but says nothing about the breakdown between output and input latency. It's a solvable problem of course, but it's not trivial.

As for the glitches/pops/scratches, well, sadly that's not too surprising when trying to get latencies as low as possible, because at some point you hit the scheduling deadline limits of the backend. Now, I suspect that there as well the PortAudio backend code could likely be improved to allow for smaller glitch-free latencies, but again this requires someone to sit down and investigate this stuff in depth (which would also likely involve writing a custom test framework for experiments and measurements).

I would love it if someone sufficiently skilled and motivated could tackle these issues, do the relevant in-depth experiments/measurements/testing, investigate the relevant PortAudio backend code, and make the necessary fixes. Sadly I lack the time and motivation to do this kind of work: even though I wrote the whole thing, I actually only use FlexASIO in a very basic capacity that does not require low latencies nor reliable latency estimates.

danryu commented 2 years ago

Thanks for the swift response.

leading to Ableton shifting the waveform too far to the left to compensate, and you end up with an audio event that seemingly occurred before the trigger. Causality was not violated - it's just the display that's misleading.

This is only happening with WASAPI Exclusive mode - interesting that this over-compensation only happens in this mode.

Still assuming that my hypothesis is correct, this would mean that in your experiments you were not measuring actual latency - you were instead measuring how well the driver-advertised latency matches the actual latency. Which, granted, is still somewhat useful, but presumably that's not what you were looking for...

I'm not really that interested in the driver-advertised latency - it's misleading but not a show-stopper if the advertised figures are off. I'm much more interested in getting a hi-fidelity, low-latency real-world result - hence why I included the generated WAVs for comparison. While the playback click track is not included (but I can reproduce it), in the case of WASAPI Exclusive the recorded click audibly precedes the playback click. Note: Ableton does have a Driver Error Compensation feature, but I did not adjust this in any test, keeping it at the default zero (see the screenshots). What I'm looking for therefore is really getting as close as possible to 0ms latency without glitches, for full-duplex playback and recording operation. If the resultant real-world (audible) latency is over about 20ms then it's hard to justify its use in a low-latency context.

As for the glitches/pops/scratches, well, sadly that's not too surprising when trying to get latencies as low as possible, because at some point you hit the scheduling deadline limits of the backend.

Ok so you'd expect this to be the case for WASAPI Exclusive at any buffer size? Because I can't remove glitches even at a buffer size of 2048. I'm just curious if WASAPI Exclusive ever works non-glitchily for any device, ever, in any configuration - because this has eluded me on the multiple laptops I've tried it on.

Sadly I lack the time and motivation to do this kind of work: even though I wrote the whole thing, I actually only use FlexASIO in a very basic capacity that does not require low latencies nor reliable latency estimates.

This is indeed a shame, as most people coming to FlexASIO will likely be looking for a low-latency solution - after all ASIO is a purpose-built low-latency framework. (Out of curiosity - what is the use case for an ASIO driver that doesn't prioritize low latency?)

So is this basically a "Won't Fix" scenario? (Unless somebody magically does all the work required in PortAudio.)

Notwithstanding all your support to this point, is there a practical suggestion you could make to provide as close as possible to the requirement of hi-fidelity and low-latency "out-of-the-box"? (ie perhaps some magical WASAPI Exclusive mode tweak that for some reason you haven't yet revealed :) At the moment my best result is with WASAPI Shared with a minimum buffer size - it actually sounds great and the fact that it is non-locking will be a big benefit potentially for many users.

dechamps commented 2 years ago

This is only happening with WASAPI Exclusive mode - interesting that this over-compensation only happens in this mode.

The code that does the latency calculation in PortAudio is backend-specific and I also wouldn't be surprised if the calculation changes between shared and exclusive mode. There are many factors involved.

in the case of WASAPI Exclusive the recorded click audibly precedes the playback click.

I still suspect that, without a properly controlled experiment with known absolute time references (which would likely involve a specialized test framework, not Ableton), that doesn't really mean much. For example Ableton could also be shifting the wav files (again I don't know Ableton so not sure if that makes sense).

Even if the wav files contain the raw I/O from the ASIO driver, their relative timings might still not be representative, because of priming: for example, if the first recorded buffer is thrown away, then the entire recorded wav file is shifted to the left and it's impossible to draw any conclusions.

I mean, what is the most likely explanation: that FlexASIO WASAPI Exclusive violates the laws of causality that permeates the entire universe, of that there is something wrong with your experimental protocol?

These things are subtle and full of traps. It's surprisingly hard to make actually valid latency measurements. Common intuition is often wrong, especially when using an ASIO Host Application that is not specialized test software designed to precisely answer the questions you are asking.

Ok so you'd expect this to be the case for WASAPI Exclusive at any buffer size? Because I can't remove glitches even at a buffer size of 2048. I'm just curious if WASAPI Exclusive ever works non-glitchily for any device, ever, in any configuration - because this has eluded me on the multiple laptops I've tried it on.

I can relate to your experience - I often observe that WASAPI Exclusive tends to be quite finicky as well. This is most likely because of problems in the PortAudio backend code. I do know that there are plenty of people who manage to make it work properly, but I don't know how they do it - presumably they are using different hardware or tweaking the various configuration knobs until it works.

Again, it would be really nice if this stuff could be fixed, but I doubt I'll be the one to do it.

Out of curiosity - what is the use case for an ASIO driver that doesn't prioritize low latency?

Room EQ Wizard only supports ASIO for multichannel, high bit-depth/sample rate, and/or bit-perfect operation. This is my personal use case and the reason I originally wrote FlexASIO. REW users don't care one bit about latency - they just need to get their audio through in the proper format.

More generally, universal ASIO drivers are useful when an application only supports ASIO and nothing else. Which is uncommon, but it does happen (especially in DAWs it seems). Some users might be forced to use ASIO even if they don't care about latency.

Another use case for FlexASIO that doesn't care about latency is when you want to use a bit-perfect backend such as WASAPI Exclusive or WDM-KS with an application that doesn't provide that natively (e.g. the application might only have direct support for DirectSound).

So is this basically a "Won't Fix" scenario? (Unless somebody magically does all the work required in PortAudio.)

Sadly yes. I mean, both FlexASIO and PortAudio are fully open source and accept outside contributions. I would be delighted if someone contributed patches to FlexASIO or PortAudio to fix these issues. However, historically, I've never seen anyone willing to contribute. To be fair, this kind of work requires a very specialized skillset and a non-trivial time investment; maybe that explains the lack of patches. (Also, speaking from experience, the PortAudio Windows code is not exactly easy to work with...)

Notwithstanding all your support to this point, is there a practical suggestion you could make to provide as close as possible to the requirement of hi-fidelity and low-latency "out-of-the-box"? (ie perhaps some magical WASAPI Exclusive mode tweak that for some reason you haven't yet revealed :)

All I know is in the FAQ already. Ironically, even though I wrote FlexASIO, I'm probably not the most knowledgeable person when it comes to optimizing FlexASIO configs for low latency. I suspect there are people out there who know more than I do simply because they have actual real-world experience optimizing FlexASIO latency for various setups.

danryu commented 2 years ago

I mean, what is the most likely explanation: that FlexASIO WASAPI Exclusive violates the laws of causality that permeates the entire universe, of that there is something wrong with your experimental protocol?

:) I was never making any arguments re causality - if that was my conclusion I would be contacting the scientific journals :)

These things are subtle and full of traps. It's surprisingly hard to make actually valid latency measurements. Common intuition is often wrong, especially when using an ASIO Host Application that is not specialized test software designed to precisely answer the questions you are asking.

Yes this is fully appreciated and mostly beyond my ken - which is why I resorted to the kind of tests that a user would typically experience.

What I'm really interested in is the "effective" latency, or what a typical DAW user ends up within their recording and playback operations. So this implies full-duplex real-time operation, a typical DAW-user use case for monitoring and recording something simultaneously.

Re the negative latency issue, my crude analysis was roughly:

sound went in
complicated stuff happened, involving mis-haps
Ableton ends up with a "pre-posted" sample - obviously through the kind of subtle quirks that you described

That ~34ms "pre-posting" discrepancy is obviously audible when listening back with both recorded and playback clicks (using different click sounds to differentiate better). (Note: I did try adjusting Ableton's "Delay Compensation" settings, without effect.)

This anomaly is frustrating as WASAPI Exclusive is sooo close to providing minimal effective latency - albeit currently with glitches (more on playback than recording I think). I mean, if you somehow end up writing the sample before the due time, it should in theory be possible to get it bang-on :) As it is, as the sample ends up out of sync, it actually sounds more jarring than the ~15ms "post" effective latency that WASAPI Shared mode produces.

I do know that there are plenty of people who manage to make it work properly, but I don't know how they do it - presumably they are using different hardware or tweaking the various configuration knobs until it works.

Anecdotally, I did try a lot of tweaking, including setting all values in:

wasapiExplicitSampleFormat = false|true
sampleType = "Int16"|"Int24"|"Int32"|"Float32"

The only thing I noticed here was that FlexASIO would only initialize with sampleType set to "Int16" if wasapiExplicitSampleFormat was set to true. Otherwise it made no difference to the latency issue or the number of glitches.

Likewise suggestedLatencySeconds values (in steps from 0.001 to 0.1) didn't make any noticeable difference.

Once or twice I would have a "freak" pass where no glitches would occur - but as soon as I tested again, glitches re-appeared.

With exclusive mode set, a buffer size of >= 512 samples reduces glitches somewhat, but higher values don't bring noticeable improvement in the glitch count.

Other than that, I couldn't really see which other levers to pull. In Exclusive mode I always end up with the latency anomaly (in Ableton at least), and an unacceptable level of glitches (ie more than a couple).

REW users don't care one bit about latency - they just need to get their audio through in the proper format.

Ok that explains that... I think that the most common real-world use case for a universal ASIO driver is probably in using software like DAWs or other similar recording or performing music software where real-time and thus low-latency is a concern, but there isn't a bespoke ASIO driver available. This applies to pretty much every DAW-on-Windows user who wants to take their laptop away from their desk and still be able to play back and edit their audio project. And a huge number of interesting audio devices that have basically relied on ASIO4ALL (the shame!). This is why ASIO4ALL is linked to even on the Ableton website, and has been downloaded a gazillion times. And to be fair it does seem to have WDM-KS working in a glitch-free, low-latency way. (Not sure why it's so important to the developer to keep the source closed, but there we are.)

The good news is that with FlexASIO working in WASAPI Shared mode, users can get low-ish latency audio with full-sounding glitch-free sound (around ~15ms experimental "effective" in a typical Ableton use case) - while NOT blocking any other audio software from operation. This is the kind of MacOS-type user experience that I am looking for, so if things improve from this point (with say upstream Exclusive mode fixes) then that's a bonus :) (Plus Exclusive mode may already work fine for some hardware that users may be using - although I think the Realtek HD Audio device I tested with is fairly ubiquitous.)

dechamps commented 2 years ago

Anecdotally, I did try a lot of tweaking, including setting all values in:
wasapiExplicitSampleFormat = false|true
sampleType = "Int16"|"Int24"|"Int32"|"Float32"
The only thing I noticed here was that FlexASIO would only initialize with sampleType set to "Int16" if wasapiExplicitSampleFormat was set to true. Otherwise it made no difference to the latency issue or the number of glitches.

I'm not surprised. Sample type is highly unlikely to make any difference wrt latency. The only options I would expect to make a difference (assuming a fixed backend and mode) is the combination of suggestedLatencySeconds and buffer size.

And a huge number of interesting audio devices that have basically relied on ASIO4ALL (the shame!). This is why ASIO4ALL is linked to even on the Ableton website, and has been downloaded a gazillion times.

And there is nothing wrong with that. WDM-KS basically means interacting directly with the Windows audio kernel driver. If the Windows audio driver for the device happens to be sufficiently well-designed and optimized to already provide the best possible latency, then the WDM-KS path (and therefore, ASIO4ALL) is already optimal and there is no point in spending resources to develop a bespoke ASIO driver. One could easily argue that it makes more sense for the manufacturer to optimize the Windows driver so that it benefits both ASIO through ASIO4ALL/FlexASIO and standard Windows apps at the same time. Two birds with one stone.

danryu commented 2 years ago

I'm not surprised. Sample type is highly unlikely to make any difference wrt latency. The only options I would expect to make a difference (assuming a fixed backend and mode) is the combination of suggestedLatencySeconds and buffer size.

Yes, the sampleType stuff was more a speculative tweak aimed at reducing the glitches, which is actually the more immediate issue of the two.

One could easily argue that it makes more sense for the manufacturer to optimize the Windows driver so that it benefits both ASIO through ASIO4ALL/FlexASIO and standard Windows apps at the same time. Two birds with one stone.

I think the bulk of the devices that I was loosely referring to are probably USB audio widgets of various kinds that do not come with any drivers, but default to using the OS-provided generic USB audio driver.

Coming from an open-source/Linux background, I really hate writing and distributing open-source software that has closed-source dependencies, apart from where absolutely necessary. It also partially kills the UX to have additional download and install steps tagged on to your own application deployment (I release for 5 x OS's and I only have this problem on Windows - ok to be fair, on Linux I have to tell users to install Jack :)

So personally I would love to see FlexASIO become the go-to choice for the kind of use cases I've described, rather having to rely on a single, poorly-maintained, closed-source driver. Also, FlexASIO is a superior option in terms of its support for the more modern, performant and featureful Windows audio API. (Plus FlexASIO is superbly documented.) It feels like it's very close to this being a technical possibility - given the right resources to look at the remaining weird behaviours in PortAudio. Note: Koord may be interesting in sponsoring an effort in that direction at some point in the near future, if there were interested parties. We are releasing an updated version of KoordASIO imminently, incorporating the freshly-merged upstream changes and a restyled config GUI, and I was wondering about possibly creating an automated driver diagnoser-debugger for FlexASIO/PortAudio issues, which would help us triage the outstanding problems in the WASAPI backend in particular. That's off-topic though - I'll pick that up with you in another thread ;)

Thanks for the invaluable insights as usual.

uzrnme commented 1 year ago

Found solution maybe? I can get low latency only from exclusive modes, but I wanna have the same with shared ones to be able to use other audio sources together as well.

danryu commented 1 year ago

@uzrnme
Check the documentation here: https://github.com/dechamps/FlexASIO/blob/master/BACKENDS.md#wasapi-backend

By the nature of Shared mode, you will not get nearly as low latency as you do with Exclusive. This is a consequence of going through the normal Windows audio processing pipeline, and is expected.

This issue mainly documents peculiar behavior with one example generic on-board audio chipset (Realtek High Definition Audio), and is not an inherent problem with FlexASIO (but rather with the PortAudio WASAPI implementation).