Add support for low latency WASAPI shared outstreams

GoaLitiuM commented 6 years ago

Added in Windows 10, there is a new interface called IAudioClient3 that enables access to very low buffer sizes (under 10ms) when supported by audio drivers. With hardware available to me, I could initialize a shared stream with very low periodicity of 3ms.

I also bundled some commits in this PR to enable specifying custom buffer sizes with outstream->software_latency value within minimum and maximum device values (addresses #109), and fix audio glitches with low buffer sizes (#90).

These changes should be compatible with older Windows systems, but I have only done tests in my own system.

andrewrk commented 5 years ago

These changes look important. I've been putting off merging pull requests recently because I'm not in a place where I can test very well. How much testing have these changes undergone?

GoaLitiuM commented 5 years ago

I have only tested these on my Windows 10 machines, and I have not found any problems so far. These changes are only relevant if the outstream latency has been set to minimum latency reported by the device. With IAudioClient3 the reported minimum latency on my main PC goes down to 5.333ms from the previous 20ms with old API.

RamiHg commented 5 years ago

This is a pretty cool change. However, there is one glaring problem that I've run into, that even your change does not solve.

The WASAP backend uses a timer to put the out-stream run thread to sleep until the next time data is needed. The problem is that the timer resolution in Windows is terrible. It defaults to 15ms. A great introduction to the whole system is here.

While you can change it to a lower value (system-wide, unfortunately. See article above), the lowest you can go is 0.5-1ms. That is still a huge margin of error compared to the durations that we actually need in low-latency applications (in the microsecond range). That means every time the out-stream run thread wants to sleep, the duration it actually sleeps for will be off by 15ms most of the time, and 1ms in the best case.

I was able to kind-of get around this by always rounding the sleep time down to the nearest 1-2ms value, and by setting the OS timer resolution to 1ms. But it's not perfect. Contrast this to MacOS, where the resolution is in nanoseconds...

I think there needs to be a fundamental redesign of the WASAP run thread for this to be truly low-latency.

GoaLitiuM commented 5 years ago

I think the OS already tries to lower the resolution automatically after the stream is initialized, or that's what I observed to happen when I implemented this in other project. The timer was already set to 2ms beforehand but it automatically got lowered to 1ms after setting up the audio stream (and I made sure no calls were made anywhere in the code to lower the timer resolution).

pervognsen commented 2 years ago

This is a pretty cool change. However, there is one glaring problem that I've run into, that even your change does not solve.

The WASAP backend uses a timer to put the out-stream run thread to sleep until the next time data is needed. The problem is that the timer resolution in Windows is terrible. It defaults to 15ms. A great introduction to the whole system is here.

While you can change it to a lower value (system-wide, unfortunately. See article above), the lowest you can go is 0.5-1ms. That is still a huge margin of error compared to the durations that we actually need in low-latency applications (in the microsecond range). That means every time the out-stream run thread wants to sleep, the duration it actually sleeps for will be off by 15ms most of the time, and 1ms in the best case.

I was able to kind-of get around this by always rounding the sleep time down to the nearest 1-2ms value, and by setting the OS timer resolution to 1ms. But it's not perfect. Contrast this to MacOS, where the resolution is in nanoseconds...

I think there needs to be a fundamental redesign of the WASAP run thread for this to be truly low-latency.

The recommended way to wait in the audio thread is to use IAudioClient::SetEventHandle to register a win32 event. This event fires as a direct result of an interrupt triggered by the audio hardware when the playback cursor has consumed a portion of the buffer and wants to notify the operating system so it can overwrite that portion with new samples. In your application's audio thread loop you wait on that with WaitForMultipleObjects (with bWaitAll = 0) where the first wait handle in the set is the event you registered with IAudioClient and the second wait handle is an event you use for signaling the audio thread shutdown. Alternatively, you could use WaitForSingleObject on just the buffer event (with a reasonable timeout). The downside compared to WFMO is it might take a little longer before the audio thread responds to shutdown requests.

andrewrk / libsoundio

Add support for low latency WASAPI shared outstreams #174