Closed siraaris closed 4 weeks ago
To reduce load I created a set of unified FIR filters to get my config down to the most efficient possible, so now I use single FIR filters per channel incorporating XO, DRC and other DSP etc.
Glitches (restarts) are observed even when CamillaDSP is quiescent. The buffer levels are also stable.
It would seem to suggest that the OS, BlackHole or CoreAudio is doing something here.
Any suggestions where to look? I'm running out of ideas (until you get time to look at thread prioritisation :), no drama just sharing what I've noticed.
Hmmm. Looks like Restarts/Buffer Levels occur hourly. Screams cronjob, will investigate, or could be when macOS maintains processes. In the graph below I've normalised buffers to 1, and restarts as counts in 5 minute bucket windows for a 24h period.
I have made some changes to improve the buffer level measurement, not sure if it helps for this issue or not, but you could the the latest next30 to see if it helps. I also added a script for plotting the processing load and buffer level while running: https://github.com/HEnquist/camilladsp/blob/next30/testscripts/log_load_and_level.py I have let it run for a couple of hours on my M1 Air, hoping to get results similar to yours. But it behaves well without buffer underruns or spikes in the level.
That's a useful utility :)
I think there are a number of factors in my case. My audio devices are AVB based, that are probably uncommon for most users of CamillaDSP.
For example, the Playback device is presented to CoreAudio from a USB connected device (RME Digitface AVB) that bridges into network transport to connect to the DAC (RME M32 DA Pro, a 32 ch DAC), managed by RME's AVB Controller.
This should be transparent to CamillaDSP if everything else is well behaved.
Things I've noticed that result in CamillaDSP glitches:
I still think it's worth implementing thread prioritisation / core affinity or similar for macOS - given the move by Apple to user space for areas that have traditionally been in kernel.
Because of the above I've spent some time tonight getting an AVB configuration setup only using the RME DAC and Apple built-in AVB Controller (avbutil), see how that goes.
Removing the RME devices as above, the behaviour overnight still suggests an hourly incident.
The three blips in the centre (around 20000) are at 7:27, 8:27, and 9:27, followed by a long stretch of stability, until I remote screened in, restarted looking at this on console.
At these times there are lots of:
2024-09-05 07:27:02.757878 INFO [src/coreaudiodevice.rs:446] Restarting playback after buffer underrun 2024-09-05 07:27:03.951268 WARN [src/coreaudiodevice.rs:455] Playback interrupted, no data available ... 2024-09-05 08:27:19.185577 WARN [src/coreaudiodevice.rs:455] Playback interrupted, no data available 2024-09-05 08:27:19.189584 INFO [src/coreaudiodevice.rs:446] Restarting playback after buffer underrun ... 2024-09-05 09:27:19.021137 INFO [src/coreaudiodevice.rs:446] Restarting playback after buffer underrun 2024-09-05 09:27:20.214530 WARN [src/coreaudiodevice.rs:455] Playback interrupted, no data available ...
A bit of progress. I removed the Presonus AVB switch, leaving only the Mac mini connected directly to the DAC via ethernet.
Lo-and behold, things seem to settle down, see screen grab.
And the RME networking (netifc) is not crashing.
Will let it run this weekend, and provide an update.
That looks very similar to how it looks on my M1 Air when using just Blackhole and the built in speakers.
It looks like macOS offers very limited control over thread priorities. There is a concept of quality of service, but not sure if it's applicable here. Then there is audio workgroups, as described here: https://www.bluecataudio.com/Blog/announcements/realtime-audio-multicore-issues-for-apple-silicon-end-of-the-story/ That looks interesting, but much more difficult to use than just adjusting some thread priorities. Not sure how feasible it is to use it in camilladsp.
Found this that I will try: https://crates.io/crates/audio_thread_priority
Behaviour overnight. Just FYI.
It looks like macOS offers very limited control over thread priorities. There is a concept of quality of service, but not sure if it's applicable here. Then there is audio workgroups, as described here: https://www.bluecataudio.com/Blog/announcements/realtime-audio-multicore-issues-for-apple-silicon-end-of-the-story/ That looks interesting, but much more difficult to use than just adjusting some thread priorities. Not sure how feasible it is to use it in camilladsp.
Seems clear that using audio workgroups addresses the issue. It's interesting that Intel CPU's have introduced varying core performance; maybe this issue exhibits on non-macOS as well.
What's not clear is detail on the "hack" that BlueCatAudio refer to. Maybe it's what audio_thread_priority Rust crate utilises?
What's not clear is detail on the "hack" that BlueCatAudio refer to. Maybe it's what audio_thread_priority Rust crate utilises?
The audio_thread_priority crate doesn't seem to be using audio workgroups, so probably not. But we don't know what that hack is (only that it's supposedly obvious if you look somewhere in the apple open source code 😒) so this is only guess.
I added audio_thread_priority to in the processing thread and the CoreAudio capture and playback threads in branch audio_thread_prio
. Can you try it on your system?
I am running the audio_thread_prio branch now - will report back in a few hours, which should be enough to see how it goes.
aris@pollen ~ % ~/Projects/camilladsp-audio-thread-prio/target/release/camilladsp --address 192.168.1.169 --port 1234 ~/Projects/keystone-bedrock-v5-Consolidated.yml --gain=-50.0 2024-09-07 21:52:21.271423 INFO [src/bin.rs:742] CamillaDSP version 3.0.0 2024-09-07 21:52:21.271442 INFO [src/bin.rs:743] Running on macos, aarch64 2024-09-07 21:52:21.374445 INFO [src/coreaudiodevice.rs:1246] The capture device supports pitch control 2024-09-07 21:52:21.480098 INFO [/Users/aris/.cargo/registry/src/index.crates.io-6f17d22bba15001f/audio_thread_priority-0.32.0/src/rt_mach.rs:158] thread 5635 bumped to real time priority. 2024-09-07 21:52:21.489947 INFO [/Users/aris/.cargo/registry/src/index.crates.io-6f17d22bba15001f/audio_thread_priority-0.32.0/src/rt_mach.rs:158] thread 8451 bumped to real time priority. 2024-09-07 21:52:21.679741 INFO [/Users/aris/.cargo/registry/src/index.crates.io-6f17d22bba15001f/audio_thread_priority-0.32.0/src/rt_mach.rs:158] thread 8195 bumped to real time priority. 2024-09-07 21:52:21.681181 WARN [src/coreaudiodevice.rs:459] Playback interrupted, no data available 2024-09-07 21:52:21.687903 INFO [src/coreaudiodevice.rs:450] Restarting playback after buffer underrun
Initial observation - buffers are very stable. I've tried spiking the CPU with various actions, which previously would trigger buffers to rise and crash - so far looking good.
Well, the thread priority change has resulted in super stability for camilladsp. No dropouts/restarts, stable buffers, really nice. There's a bit of lagginess on the UI when I remote in, but the Mac mini is dedicated for camilladsp so that's a small price to pay for audio stability and performance.
I'll keep it running, and report back. If there's anything you need me to look at specifically, just shout out.
Looks great so far! I don't have any specific things I want tested, just curious about how it behaves when kept running for a while.
I think you can probably treat this as "done". Not really sure that Performance cores are actually used, but regardless makes no difference - the result is stable and performant behaviour.
Would you be interested in trying multithreaded processing? The branch "with_rayon" supports splitting filter tasks among several threads. This is enabled via a new optional boolean multithreaded
in the devices section of the config (that defaults to off).
It hasn't gotten much testing, so please start with amplifiers powered off :)
The idea is that between mixers and processors, each channel can be filtered independently from the others. So it collects the filters to apply to each channel, and then uses the really smart rayon library to process the channels in parallel in a set of worker threads. It needs quite heavy filter tasks to actually help, with too "easy" filters the overhead of passing things back and forth between threads gets larger than the actual processing time. I think your config could potentially benefit.
I tried the with_rayon branch.
Seems to run ok, but now get underruns - I think because the threads you create per channel need also to be real-time?
For example - with multithreaded: false, I can stop/start Safari (ie cause CPU spikes), and camilladsp is unaffected.
With multithreaded: true, stop/start Safari causes camilladsp to hiccup.
now get underruns - I think because the threads you create per channel need also to be real-time?
Yes those also need to have their priority raised, just didn't get to that yet. But did you see any change to the processing load? Hopefully it should be lower.
I'll run for a while with graphing and let you know.
The load is lower yes.
Longer timeframe. You can see the spike at the end when I remote in a screen capture.
Ok! Thanks for testing. I would expect the threading to make it more sensitive to interference from other loads. Raising priorities should help, but there may still be delays when waking up the worker threads, and when they notify the main processing thread that they are finished.
I'll happily test! I think it's worth pursuing and having the support there?
The with_rayon
branch is updated. Now it raises the priority of the workers, and the number of workers can be set by the worker_threads
parameter in devices. Leave it out or set to 0 to let rayon decide, which becomes one thread per hardware thread of the machine. On the Windows laptop I'm using at the moment (12-core Snapdragon X Elite cpu), anything above 4 threads gives the same processing load.
Initial observation is that setting worker_threads manually is required, as the default number (when I set to 0) may be too high, and deterimental.
On my Mac mini, I've set worker_threads to 4 and that seems to be ok.
Will leave it running for a while and report back.
Final check in, only stable at 192khz, 32 ch with 2 threads.
But with that it's rock solid.
Without creating a new issue/suggestion, I've been experimenting - on Linux - on pinning camilladsp to specific CPU's as I'm still experiencing xrun issues.
Summary of steps I've taken:
Disabled Hyperthreading in BIOS Installed RT kernel (6.10.11-rt-amd64) tuned-adm latency-performance I have an 8-core i7, so: Boot kernel with isolcpus=0,1,2,3,4,5 (on Debian, edit /etc/default/grub and run update-grub) Set CamillaDSP config: multithreaded: true worker_threads: 4 Start camilladsp with taskset --cpu-list 0-5
Does pinning make any difference?
I'm not sure yet, but for the above configuration there are still underruns reported:
aris@controller:~$ tail -f proj/log/camilladsp.log 2024-10-06 16:37:35.801437 INFO [src/bin.rs:781] CamillaDSP version 3.0.0 2024-10-06 16:37:35.801443 INFO [src/bin.rs:782] Running on linux, x86_64 2024-10-06 16:37:35.924265 INFO [src/alsadevice.rs:789] Capture device supports rate adjust 2024-10-06 16:37:36.036685 INFO [src/alsadevice.rs:117] PB: Starting playback from Prepared state 2024-10-06 17:21:36.039657 WARN [src/alsadevice.rs:113] PB: Prepare playback after buffer underrun 2024-10-06 17:21:51.833882 WARN [src/alsadevice.rs:113] PB: Prepare playback after buffer underrun 2024-10-06 18:31:57.670345 WARN [src/alsadevice.rs:113] PB: Prepare playback after buffer underrun 2024-10-06 18:45:02.588972 WARN [src/alsadevice.rs:113] PB: Prepare playback after buffer underrun
I don't think it's a system issue per-se, as the same filters, and pipeline setup on Brutefir doesn't exhibit xruns (well, at least for 12 hours at a stretch). For the "same" configuration, CamillaDSP xrun's every hour or two.
Cognisant that this isn't macOS related, do you want me to create a new issue - we can probably close this one, as macOS on M1 Mac with the changes you introduced were solid (provided that worker_threads wasn't too high.
Yeas that fits better in a new issue. Please attach the config file, and the output of aplay -l and arecord -l.
It would be good to have a mechanism for camilladsp on macOS to utilise Performance cores on the M series Apple Silicon CPU's.
Under some loads, eg high channel count, high sample rate and reasonably large FIR filters, the Efficiency cores may not be suitable, e.g. I've observed glitches, high load reported from camilladsp when running on Efficiency cores that seem to go away when the OS promotes the process to Performance cores (investigation is ongoing!).
I think thread priority must be set in code, as there's no option the I'm aware of for this to be set in user land.