Add a code path for better shared mode wasapi on windows

shangjiaxuan commented 4 years ago

Newer versions of windows exposes a event callback based shared mode, which has much smaller latency (up to less than 10ms) and min max frame counts and os mixing engine frame size can be queried through the new interface.

The added test in refresh device tests for the existence and sets a flag of availability in the device (I normally use c++, I don't know if there's bool type in pure c, if not, may need to change the bool to int), and the stream_do_open will open the new interface if detected earlier, and sets the smallest buffer size. The shared run code also added the path for this event based shared mode.

Since the queried size is in frame count, the path was changed to use frame count instead of reference time for better precision.

Tested on my device, before using this setup, the program continues to beep after a breakpoint is hit for a few seconds, corresponding to the default 4 second buffer size (a really large latency for anything, even multimedia playback is not allowed to have this lag). After using the new one, the time shortens to something acceptable (I cannot estimate with my ears).

Of course, setting the buffer size to something like 20ms in original code has similar effect, but that often ends up with lots of overhead. (It seems typically windows updates sound on a 10ms time frame, incompatible buffer sizes will impose overhead and dropouts.)

shangjiaxuan commented 4 years ago

It seems that on my device, windows updates only on a 10ms basis with 48000Hz floating point (480 frames, max and min are the same, increment also the same), and minimum buffer created has a size of 1056. It seems that the audio system on my device always asks for some padding for re-sampling and does some buffering at the same time with the new code path.

It may be that the buffer size is the total software latency on windows on my device, and buffers are only sent to engine after filled completely in shared mode (and the few seconds of sound after a breakpoint has been hit can be explained).

EDIT: Running the latency.c will has correct timing of sound and console output time, but after breaking in (program execution paused), there will still be a few beeps before silence. The code seems to only take into account of the time between two writes of time. The audio buffer filled, however, is always much larger than the two writes, thus making the test invalid (only tests the performance of between-thread transfer of data.)

shangjiaxuan commented 4 years ago

It seems that there are also and issues pr not merged for this #174, will look further into it. Also #109 seems related.

wegylexy commented 4 years ago

Something has changed since Windows 10 v2004, does this patch solve those issues too?

wegylexy commented 4 years ago

You mostly changed for outstream only. How about instream?

shangjiaxuan commented 4 years ago

This is mainly meant for better implementing a buffer size negotiation and callback based buffer submission. The main code path is no different from the original, only that the buffer can better accommodate the shared mode engine period, thus reducing the buffer size while not making audio dropouts.

I doubt if audio input will benefit from this as long as it is using a callback based implementation, since the available buffer size is determined by the os at wakeup time.

It is possible to change the os audio engine period to something much less than the default 10ms, but I did not have the hardware to test it (mine reports 480 to 480 samples working on native 48kHz float), thus currently only the current period is used.

Also when using smaller frames, like the raw mode on my device, there is a constant flow of dropouts. It seems that the thread scheduler cannot work on that much granularity, and must use the realtime work queue API for scheduling if such precision is needed.

The + was there when I was diffing, I'm sorry for that inconvenience. It was there probably when I was testing to compile as C code or CPP code, since the GUID const reference function prototype is different. The realtime work queue thing may need a lot more effort and change to implement.

shangjiaxuan commented 4 years ago

WIP

wegylexy commented 4 years ago

See also https://chromium.googlesource.com/chromium/src/media/+/refs/heads/master/audio/win/core_audio_util_win.cc for things like GetDevicePeriod, and handling of WAVEFORMATEX that is not WAVEFORMATEXTENSIBLE (e.g. PCM, IEEE_FLOAT).

shangjiaxuan commented 4 years ago

GetDevicePeriod only retrieves the device period, but in shared mode this does not map to the buffer submission size. In shared mode, the OS audio engine will gather buffer from all the apps and mix them (may or may not be in hardware), and then submit to the output. This shared mode buffer size is typically a few times larger than the output buffer size to enable mixing and resampling different streams to the hardware format. Shared mode cannot work on the buffer size retrieved from GetDevicePeriod without audio glitches. (And the smallest raw mode period typically cannot be achieved by the thread scheduler, with period that may be only 1-2ms, which is close to the thread cpu time frame.)

Thanks again for pointing to a working implementation, I'm adding a bit of non-WAVEFORMATEXTENSIBLE (8 bit and 16 bit pcm support, others may not work and should return invalid) in the WIP pull.

shangjiaxuan commented 4 years ago

The phnsDefaultDevicePeriod parameter specifies the default scheduling period for a shared-mode stream. The phnsMinimumDevicePeriod parameter specifies the minimum scheduling period for an exclusive-mode stream.

It seems that can be used when audio3 is not available, will put this into the WIP.

shangjiaxuan commented 4 years ago

Fixed problems listed. They are mainly code for testing and initial waveformatex support. Sorry that I forgot to look at the diffs and made such mistake.

Also fixed pausing the original shared mode (It seems only stopping the stream, and then supplying buffers to it will somehow restart it on my PC (sdk 1903) currently, may be because of the smaller buffer size from GetPeroid exposed this problem or because of driver change (the original 4 second buffer may be much larger than the time from pause to unpause).)

wegylexy commented 4 years ago

It works on v2004. Will test on v1909 soon.

0x08088405 commented 3 years ago

Newer versions of windows exposes a event callback based shared mode,

@shangjiaxuan Sorry for bumping an old issue but is event mode exclusive to newer versions of IAudioClient? I know MSDN isn't great but does it mention this anywhere? Writing my own implementation based on libsoundio and that caught my eye.

andrewrk / libsoundio

Add a code path for better shared mode wasapi on windows #231