Open nyanpasu64 opened 2 years ago
I'm probably not implementing alsa duplex until cpal can actually properly detect and open my hw devices. pipewire-alsa (and likely pulseaudio-alsa) are terrible apis for low latency output and duplex, because both the app and the audio server buffer audio (there are possible workarounds, which pipewire-alsa doesn't do, and handling duplex correctly is especially tricky and situational, and it's difficult to get a general solution).
Right now cpal doesn't detect hw out, and crashes trying to read from hw in (#630). Does cpal seek to target professional DAW use (jackd or alsa hw devices), or mainstream users without a spare audio interface (pulseaudio/pipewire servers, any protocol they support)? If the latter, I think adding a pulse backend is more important, at least until pipewire becomes mainstream (at which point cpal can use the jack backend or add a pipewire one).
What's the current state of cpal support for duplex streams? I'm working on an advanced audio app using cpal where i need to process audio from input and send it back to the same hardware device for output. I have input and output as separate streams and ringbuff in between, it kinda works but sometimes (totally random) i get "backend specific error: broken pipe" and input stream starts giving me silence, I have to restart my app to make it work again. Sometimes it works for a few hours, sometimes only few seconds. Any ideas how to handle this problem?
This is a draft.
Overview and speculation
I'm told that JACK clients are fed input and output buffers synchronously, by jackd (the audio server), and that JACK's application-facing API abstracts away buffer size management from the app, and instead jackd (the server) handles routing and hardware IO/buffering. ALSA clients are not like that. You open independent input and output streams, and you have to align their block sizes, sampling rates, open them at the same time, read and write the same amount from both streams, etc. I hear that Apple's Core Audio exposes a JACK-like synchronous duplex API that communicates with physical hardware. (On Linux, you can have JACK's synchronous buffers, ALSA's direct hardware access, or neither when an ALSA app talks to pulseaudio-alsa or pipewire-alsa.)
On Linux, I get the impression that the only apps designed to be routable in a graph are JACK apps. PipeWire lets you route the inputs and outputs of Pulse/ALSA apps arbitrarily as well (in a patchbay app), but the apps were not written with this in mind. Worse, in ALSA's case, the application-facing API was written around timing being determined by hardware in real time, and the app managing data buffering itself. As a result, I think ALSA duplex can achieve the same round-trip latency on physical hardware (from hardware line in to speaker out) as a JACK client, but I'd be surprised if you can chain 3 ALSA duplex apps in a PipeWire graph and not get 1-2 periods of added latency per app, whereas 3 JACK duplex apps on pipewire-jack (see below) add zero latency compared to 1 app.
jackd never changes buffer/period sizes. pipewire changes buffer/period sizes when you open and close apps. I'm not sure if/how it changes the period size of an ALSA device, but it seems buggy. Canberra notification sounds set the period to 8192 samples (absurdly high latency) after they start playing, there is/was audio glitches when periods get longer (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/1436 ?), and the round-trip latency of
jack_iodelay
fed through speaker output, a physical aux cable, and line-in input can change (when input/output period sizes diverge? upon xrun?).jackd (JACK server) and rtaudio (an audio library for apps, with Linux/Windows/Mac backends) are ooold, and both use threads but predate C++11 atomics. jackd uses volatile variables, rtaudio just uses data races. RtAudio has real-world race conditions as well (which I could trigger if I wanted with a crafted test app), and incomprehensible data ownership/sharing that I'd have to rewrite to fix.
This was a learning experience. But I'm really not the most qualified person to talk about ALSA. Sadly I don't know who else understands ALSA vs. JACK well, and is willing to share their insights publicly.
jackd notes
this is a summary (given my current understanding) of how jack2 (didn't look into jack1) handles ALSA duplex, and how I'd implement it in cpal #553, or improve RtAudio's duplex, etc:
Threads:
setup (
alsa_driver_new()
->alsa_driver_set_parameters()
):snd_pcm_hw_params_set_periods_integer
,snd_pcm_hw_params_set_period_size
(exact),snd_pcm_hw_params_set_periods_min
(we can tolerate the hardware forcing more periods than requested by the user) andsnd_pcm_hw_params_set_periods_near
(abort if the hardware forces less periods than requested), thensnd_pcm_hw_params_set_buffer_size
snd_pcm_hw_params_set_periods_integer
). Though in an application-level audio library like cpal (where users may not be professionals micromanaging their buffer sizes), it's not strictly necessary to get exactly the period size the user requested, so we could potentially make "buffer size/count doesn't match user request" not a hard error. What does RtAudio do?snd_pcm_sw_params_set_start_threshold(0)
snd_pcm_mmap_commit
. Whereas in my sample app, with/without this call (or if I set it to the default of 1, or even(snd_pcm_uframes_t)-1
) ALSA outputs do start upon my sample ALSA app callingsnd_pcm_writei
. To match jackd's behavior in my sample app, I have to setsnd_pcm_sw_params_set_start_threshold()
greater than the total buffer size (eg. the value ofsnd_pcm_sw_params_get_boundary()
, or buffer size * 2, or buffer size + 1). Personally I'd use the boundary.snd_pcm_link()
. If it fails, note it down and keep going.snd_pcm_link()
fails, so I can't use it to open a duplex connection to pulseaudio-alsa or pipewire-alsa, which is broken IMO.beginning playback (
alsa_driver_start()
):snd_pcm_mmap_begin()
returns the entire buffer when asked to. (Is this the case in all modern hardware supporting mmap?) And ifsnd_pcm_mmap_begin()
instead returns only 1 period (IDK if this can happen, it doesn't on my motherboard audio or USB audio UAC1 FiiO E10K), jackd will silently overwrite memory out of bounds, instead of erroring out.snd_pcm_link()
failed, start the capture stream too.in the main loop (
JackAudioDriver::Process()
):in jack2 synchronous mode (
JackAudioDriver::ProcessSync()
), the main loop waits for both input/output to be ready, then reads input from hardware, computes output, and writes output to hardware.Details:
alsa_driver_wait()
), until both are ready. If one falls behind (so by the time it's ready, the other device has already reached xrun), report a xrun etc. (see below)snd_pcm_wait()
on both streams, then afterwards verify both aren't in xrun. It may be simpler than polling, and I find it less confusing than polling (maybe because I'm not experienced with it, though alsa exposes a safe wrapper forpoll()
). But it's less powerful; if you're blocked on a stream that never becomes ready, you can't pickup on xrun events from the other stream (which would abort the polling loop). You could instead use a 10ms timeout loop I guess? (Do fast timers impair hardware timer power management?)upon xrun:
I took a quick glance at what happens during xrun, and I think this is what happens: Stop and start both capture and playback streams, regardless of which one hit xrun. Don't close or recreate streams or any other state, though.
alsa_driver_wait()
on the audio(?) thread callsalsa_driver_xrun_recovery()
calls global functionRestart()
JackAlsaDriver::Stop()
alsa_driver_stop()
ClearOutput()
(this is related to jackd's audio graph, not ALSA, so we probably don't care for cpal)snd_pcm_drop()
on the playback stream. Ifsnd_pcm_link()
failed, call it on the capture stream too.JackDriver::Stop()
Jack::JackDriver::StopSlaves()
. I don't know what slaves are. I know that cpal doesn't have them.JackAlsaDriver::Start()
JackDriver::Start()
alsa_driver_start()
(aka goto "beginning playback")How does cpal handle xruns? Does it handle them at all? (TODO look into it)