Is it safe to just put up memory fences for ui<>audio thread sync?

free-audio / clap

Audio Plugin API

https://cleveraudio.org/

MIT License

1.71k stars 98 forks source link

Is it safe to just put up memory fences for ui<>audio thread sync? #351

Closed sjoerdvankreel closed 10 months ago

sjoerdvankreel commented 10 months ago

So far i have seen @baconpaul clap-saw-demo which uses "bring your own lock free queue" and nakst's tutorial here https://nakst.gitlab.io/tutorial/clap-part-2.html which uses full-on mutexes. Both look like overkill to me. I'm trying out something that seems much simpler, but, i am not 100% sure this is absolutely safe.

Given a plugin with 2 parameters, declare the plugin like this:

double _audio_value_1, double _audio_value_2, double _ui_value_1, double _ui_value_2
in clap_plugin_params.flush, copy values from incoming events to both ui and audio (since it may be called on either thread) and then std::atomic_thread_fence(std::memory_order_release)
in clap_plugin_params.get_value, std::atomic_thread_fence(std::memory_order_acquire), then return what's in the ui value
at start of clap_plugin.process: std::atomic_thread_fence(std::memory_order_acquire), then copy everything from ui to audio
at end of clap_plugin.process: copy everything from audio to ui, then std::atomic_thread_fence(std::memory_order_release)

This scheme may be a bit heavy on the ui thread because of the barrier in clap_plugin_params.get_value but it only requires 2 fences in the process() call. It really is so much simpler compared to either the queue-based or the mutex-based approach, and i'm getting good results from both reaper and bitwig (x64 windows). But since neither demo/tutorial does it like this - i have to wonder why? Am i missing something? Feedback appreciated.

-greets, sjoerd

baconpaul commented 10 months ago

so generally anything which may contend with non-audio-thread is what we avoid. so don't alloc, don't lock etc

it was really pretty easy to do that with clap saw demo and i've copied that to our other new properties too, both clap and non clap.

i'm not entirely sure what the atomic thread fences do. If you can look at the assembly and convince yourself you aren't contending you are fine. Atomic updates with distinct reader writers are how all the lock free queues work anyway so your strategy may not be that different.

sjoerdvankreel commented 10 months ago

@baconpaul

<< so generally anything which may contend with non-audio-thread is what we avoid

That basically throws nakst's example right out the window. That's fine, it's just an example.

<< i'm not entirely sure what the atomic thread fences do. If you can look at the assembly

Godbolt tells me that both aqcuire/release for x86_64 gcc translate to "lock or QWORD PTR [rsp], 0". For msvc it's "call atomic_thread_fence" which eventually translates to "__MACHINE(void _ReadWriteBarrier(void))". I'm guessing that's a compiler intrinsic. I'm also not surprised there's no difference between acquire/release since to the best of my knowledge x86_64 has no distinction between weak/strong semantics.

So if i'm not mistaken my approach will not contend in the sense that the OS needs to be involved (i.e. no context switches, no scheduler problems). Sure it will probably take a hardware lock hit somewhere, but so will atomic RW-queues.

Thing is - this seems a bit too good to be true. I was able to remove all your R/W queue code that i copied over from clap-saw-demo, dropped the readerwriterqueue dependency, and stuff "just works". Shaved a couple dozen lines off the code base too.

If this could possibly work -- why doesnt everyone do it this way? Even in the event the user is dragging ui knobs as the process() call is running - so you can have "inconsistent" values in the process block (i.e. osc 1 is not set to pulse, but we have events for PWM) -- i believe the same can happen with atomic R/W queue and the plug just has to deal with it.

Thanks for your help again and sorry for the long rant. I'm just trying to be absolutely positive that my super-simple approach could actually work and not have some unexpected side effects,

baconpaul commented 10 months ago

<< so generally anything which may contend with non-audio-thread is what we avoid

I have not read naksts code. If I was reviewing audio code with a mutex lock on the audio thread, I would ask an awful lot of questions about how we know it wouldn't contend. There's cases but its definitely not a best practice I don't think. (Although on modern systems it basically works for many cases)

The conceptual model of queues is clean. it is guaranteed to be correct. It doesn't depend on the compiler. It works in different threading models (like linux with a different ui thread model). I am sure modern CPUs with atomics could do something different. Just no-one does. Until you perhaps?

If it works and you are sure it works, I'm sure folks would read that blog post.

How did you deal with the begin/end edit from the UI to the audio thread, by the way? like the ui -> audio thread actually does need to generate outbound events, which is another convenient part of the queue model.

sjoerdvankreel commented 10 months ago

<< I have not read naksts code. If I was reviewing audio code with a mutex lock on the audio thread, I would ask an awful lot of questions

A-ha! Upon further reviewing nakst uses a hybrid approach -- locks on the ui thread, queues on the audio thread. That definately clears up things a bit.

<< If it works and you are sure it works, I'm sure folks would read that blog post.

I am not. That's what i came here to ask -- is this safe or not. My empirical evidence suggests it works on my machine ;) Rather it's the other way around -- i am looking to disprove my approach.

<< How did you deal with the begin/end edit from the UI to the audio thread, by the way?

So far i am dealing only with the host-provided ui. I'm sure the actual ui will pose some challenges of its own. However at least to my mind, dragging a knob could fit this model: send the approriate gesture-start/value-change/gesture-end sequence on the main thread, flush() the last value, be done. I don't know if this will actually work, but so far playing around with host-provided ui->sync to audio works, and automation lanes update both the audio side and the ui side.

<< The conceptual model of queues is clean. it is guaranteed to be correct

Agreed! In fact it is so standard that it's baked into vst3. Just setValueNormalized(), be done with it. I don't really understand why something like this is not baked into clap? I mean lots of freedom okay thats fine, but again, "bring your own lock free queue" is a bit out-there. Don't get me wrong, i really like the clap format *as it is", clean api, clean docs, most importantly good support ;) but the audio<>ui sync thing is a bit too much free-form for me.

Judging from your comments i believe i am better-off playing it safe using a conventional queue. No problem, ill just go that route. Thanks again for helping me out.

-sjoerd

baconpaul commented 10 months ago

All great points

I would say the reason it isn't baked into clap is: clap only concerns itself with one thread. Everything is on the audio thread.

If you want a UI, get stuff to the audio thread If you want to read files, get stuff to the audio thread If you want to do anything, get stuff to the audio thread

So the core idea is "get stuff to the audio thread using a mechanism of your choosing with a single sorted list of events"

This makes it incredibly easy to program a clap (compared to the multiple interleaved objects and queues in VST3) but it does come with the cost that ... welll

you have to get stuff

to the audio thread

Hope that helps understand some of the design philosophy.

baconpaul commented 10 months ago

(there is of course the 'main thread' but nothing happens there other than 'some extensions'. The fact that 'gui' is a big extension means there should be patterns arising. I'm working on one of those right now but I think they are better outside of clap. Keep clap tiny and simple. Let a million flowers bloom. If you are interested in the approach I'm taking hmu on a discord or drop a way to reach you or just go lurk my GitHub :) )

ofsjoerdk commented 10 months ago

<< clap only concerns itself with one thread. Everything is on the audio thread

Correct if you count out init/destroy/(de)activate. Which i'm guessing you can because they never overlap the process call. That leaves the state, gui and params extensions which are the big ones.

<< This makes it incredibly easy to program a clap

Yes. A plugin that does not concern itself with load/save/ui is definately easier in clap compared to vst3.

<< Hope that helps understand some of the design philosophy.

Absolutely. I came to clap with a vst3 mindset. I took the processor/controller thing to be somewhat standard. Now i realise it is perfectly valid for a plugin to basically have nothing more than a process() call - possibly not even a host provided ui. Just events-in, audio-in, audio-out.

<< compared to the multiple interleaved objects and queues in VST3

I guess that depends on the internal plugin structure. If it looks like for-each-frame-then-for-each-oscillator then yes, the clap way is definately better. If it looks like for-each-oscillator-then-for-each-frame then either approach will do because you have to rewrite them anyway to fit. But clap has the advantage here because its better for the one approach and equally good for the other.

<< means there should be patterns arising. I'm working on one of those right now

Very much interested. I checked your github but theres so much repos in there -- which one is it?

-cheers, Sjoerd

sjoerdvankreel commented 10 months ago

Closing - i'll find you on discord.