Trigger events at a specified AudioContext time

WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG

https://webaudio.github.io/web-audio-api/

Other

1.05k stars 167 forks source link

Trigger events at a specified AudioContext time #473

Closed joeberkovitz closed 9 years ago

joeberkovitz commented 9 years ago

Currently the Web Audio API doesn't support any sort of time-based event dispatch in the main thread for some AudioContext time in the future. There are instead two mechanisms developers can use: callbacks from window.setTimeout/setInterval() and actions taken in ScriptProcessorNode's onaudioprocesshandler (which is slated to go away in any case). Using at least one such mechanism is necessary to continually replenish an audio node graph that is too large to build in one shot, or which would otherwise be infinite, e.g. a sequencer repeating an infinite looped sequence of audio events. This must of course take place in the main thread since there's no other way to access an AudioContext.

If the API had such a mechanism, not only would developers be able to be notified in the main thread of some impending AudioContext time, but this same mechanism might be able (with no modification on the developer's part) to address the same need for OfflineAudioContext -- where setTimeout() totally fails to satisfy the use case, because an offline context renders at an unpredictable realtime speed.

[This roughly paraphrases a proposal by Norbert Schnell at WAC 15.]

kunstmusik commented 9 years ago

Adding some notes:

The use of setTimeout() and setInterval() can work but runs into problems from jitter. Event timing can be greatly improved with ahead-of-time scheduling with a subset of future events, as proposed by Chris Wilson in "A Tale of Two Clocks". This scheduling method is also explored in Brandt and Dannenberg's article [1]("Time in Distributed Real-Time systems") as a "Forward Synchronous Model". However, because the JS Main Thread's jitter time is very much unbounded, the model can be compromised.
I see the primary issue here as: Can the developer write code that, using some mechanism, allow them to run completely in sync with the Audio Engine, being called at block boundaries (in the same way an AudioNode is called at block boundaries)? It seems that ScriptProcessor can do this, but if it is removed, then some other mechanism should be available to the developer. This allows user-code to then function both in realtime and non-realtime, as well as implement custom event systems and write temporally-recursive code without jitter.

[1] - "Time in Distributed Real-Time Systems", Eli Brandt and Roger B. Dannenberg. 1999. Available at: http://www.cs.cmu.edu/~rbd/papers/synchronous99/synchronous99.pdf

joeberkovitz commented 9 years ago

@kunstmusik said "It seems that ScriptProcessor can do this". But this isn't true. There is just as much jitter between a block processing boundary and a ScriptProcessor event dispatch as there is with setTimeout(), as @padenot has pointed out. They are both in the main thread.

As long as time-sensitive code needs to mutate the audio processing graph, and that graph is only accessible in the main thread, I believe the jitter problem will remain with us. (I believe that mutating the graph from the audio thread is fraught with other difficulties, but I'll refrain from trying to describe them.)

padenot commented 9 years ago

Note that "simply" having access to the AudioContext in a normal WebWorker would partially solve main thread jitter, since the WebWorker's event loop would be less busy. Of course this would go in combination with writing code that is not gc-intensive, etc.

Having synchronous graph modification on the audio thread is a somewhat more (read: very) complex affair (although this would solve everything), but it can certainly be implemented (even taking into account the peculiarities of the web platform).

kunstmusik commented 9 years ago

@joeberkovitz It seems my understanding of the processing model for ScriptProcessorNodes is not correct. My comments are based on the assumption that the code in onaudioprocess is called synchronously with the rest of the audio graph's processing functions. Also, I assumed that being within the audio thread, calls to mutate the graph would be done (or a message passed and scheduled) without going across further boundaries. Hence a message or call to mutate would be performed either within the duration of the buffer being processed, or processed at the next time the graph is processed and messages handled.

Maybe to better explain where I'm coming from, I'll mention what I was envisioning:

I assume all calls to mutate the audio graph from the JS Main Thread side of things actually just sends a message to the audio thread processor. This message would be scheduled in the audio thread's message inbox at best effort, but with no guarantees.
Messages are handled once per block. Messages may trigger code that would mutate an audio graph.
If code is run within the audio graph, and a request to mutate the graph is done by that code, it can only post a message. That message would be scheduled immediately, but will be processed at the next time the message processor is done.
If code is run within the audio thread, but as part of a separate control graph, it can mutate the audio graph immediately as the graph is not being processed. This allows tighter timing as the above requires a one buffer delay with a message. Hence, if a user implements an event processing framework and has an event at time 0.0, if the control graph is run before the audio graph, the event processor can read the event, assemble a node graph, and add it to the audio graph such that the new node graph will also start at time 0.0.
This allows calls to mutate the graph to be lock-free/wait-free from the JS Main thread, as well as preserves consistency from the audio thread side of things, as mutations can still only occur between block boundaries.

To note, I've implemented a similar design within my music software system Pink (https://github.com/kunstmusik/pink/blob/master/doc/architecture.md). Some things are made easier with this kind of design, from the perspective of the application developer, if data structures can be shared between the audio and application threads, but it is not necessary.

cwilso commented 9 years ago

This is a lot of detail. I wanted to offer a few comments:

There are two separate hardware clocks here - the system clock and the audio clock. There will ALWAYS be some jitter. The need is for providing a way to minimize it, and account for it.
Enabling graph modification from inside the audio thread would, in my opinion, be a bad idea.
You could write code that is in "in sync with the audio engine" if audio engine == processing of blocks; but remember, that's not the same as being in sync with audio coming out of the speakers. The audio block processing, for example, does not always proceed in a uniform fashion - we may process a number of blocks one right after another, and then go quiet for a bit. To really come close with syncing real events (visual or other things in main thread) with audio, you need to do a conversion and hope for the best.
" but this same mechanism might be able (with no modification on the developer's part) to address the same need for OfflineAudioContext" - this simply is not true. You would need a SYNCHRONOUS callback mechanism across threads in order for this to work, and cross-thread synchronicity is not something we should design for. (This is why ScriptProcessor doesn't work in OfflineAudioContext on Chrome, btw.) As @kunstmusik figured out, onaudioprocess is NOT called synchronously in ScriptProcessor today; that's why there's additional latency in SPs, so that hopefully we'll get the buffer back into the audio thread before we need it.

In short: I do not think we CAN provide a synchronous callback, and we already have an issue on providing a conversion between time clocks (issue #12), which I think covers the use case. I think this issue should be closed.

joeberkovitz commented 9 years ago

@cwilso I understand all your points. Notwithstanding all of @kunstmusik's followup (which I think was mostly clarifying your points 1-3), I think your point number 4 (offline rendering) is still worth keeping in play, it doesn't shut down debate on this issue for me yet.

I understand that such a callback would need to be synchronous for offline rendering. I am not suggesting that it be synchronous in general. But we already know that if we have no way of pausing offline rendering, there is no way to safely incrementally construct a long-running graph. Don't think of this as designing synchronous behavior into the API (which can obviously not work in a realtime context). Think of it as automatically pausing offline rendering during the callback, since this is completely safe to do and in fact necessary. The result is an API to which developers can write their incremental graph manipulation code once, for both online or for offline rendering. This is the idea that came up at the conference on Wed. and I think it's worth a little broader discussion before killing it.

cwilso commented 9 years ago

Note that we already have "Enable progressive rendering in OfflineAudioContext" as an issue (issue #302) - forgot to put that in my last response. I agree we should enable that use case; however, that's solvable in different ways (than enabling synchronous callbacks) , and does not have the real-time-synchronization baggage that this issue implies.

I think the scenario of "keeping the real time pump primed" is different from the "progressively create OfflineAudioContext graph bits" scenario; they could, of course, use the same underlying function ("add graph bits that need to come into play from time x to time y"), but the cycle on which that function would be called is different (realtime likely needs to minimize latency, offline probably wants to optimize around either uniform output time chunks (if progressively getting data) or events (i.e. trying to render no more than set number of events at a go, to keep memory use consistent).

kunstmusik commented 9 years ago

@cwilso Regarding your points, I'll reply below:

There is no argument about two clocks, or jitter between threads. This isn't about that. This is about the ability for users to implement event systems completely in sync with an audio engine. This allows jitter-less event system designs, and thus provides the user the capability to create consistent, reproducible results. You only need to minimize or account for jitter if your system design assumes communications between multiple threads or processes (and thus, assuming events must be processed separately from the audio thread).
I disagree. I think it's actually worse when a graph modification is done outside the audio thread. If mutation is done only in the audio thread, it can be properly isolated to only occur between block boundaries. Otherwise, if any thread can mutate the graph at will, it gives no guarantees of consistency in graph processing without the use of locks. Please see the previous link given for Pink's architecture; there is a diagram there that shows how pending adds/removes (graph mutations) are implemented as messages, and how a user from another thread, or event system from within the audio thread, can request mutations safely in a lock-free/wait-free way.
No argument about estimates on time when trying to sync one thread to another. There are techniques like Delay-Locked Loops (see http://kokkinizita.linuxaudio.org/papers/usingdll.pdf) that can help with this. But again, this issue isn't about trying to sync two different threads or clocks, but rather to extend the capabilities of the single, audio-thread. How user application code syncs with the audio engine is, IMO, an orthogonal issue.
I disagree. Synchronous processing is exactly what is required for consistent, reproducible results, and I think it should be very much considered as a future requirement. I think some clarification is necessary: for a user to create synchronous processing does not require cross-thread synchronicity. The AudioWorker proposal is the prime example of this, as it allows the user to define audio code that runs synchronously with the engine. Only messages are allowed to interact with the worker. The proposal to have AudioWorker extended to allow things like graph modifications, or a new AudioEngineWorker type that would have these capabilities, does not have any requirements for cross-thread synchronization at all. It also does mean that a user would be able to create an event system that works in realtime and non-realtime without change.

I'd like reiterate the point that the Web Audio API proposal for AudioWorker is already proposing user-definable, synchronous processing, at least for audio. Extending what can be done in that processing to include graph modifications, either by extending AudioWorker or implementing a similar but different worker-type, would move towards resolving this issue.

cwilso commented 9 years ago

On Tue, Feb 3, 2015 at 12:42 PM, Steven Yi notifications@github.com wrote:

There is no argument about two clocks, or jitter between threads. This isn't about that. This is about the ability for users to implement event systems completely in sync with an audio engine. This allows jitter-less event system designs, and thus provides the user the capability to create consistent, reproducible results. You only need to minimize or account for jitter if your system design assumes communications between multiple threads or processes (and thus, assuming events must be processed separately from the audio thread).

"The ability to implement event systems completely in sync with an audio engine" - this is just not possible. Events ARE processed separately from the audio thread, and many things are isolated from the audio thread, just like with any other worker - because the web platform already has a threading model, and the DOM (to choose one important thing) is not multi-threaded.

I disagree. I think it's actually worse when a graph modification is done outside the audio thread. If mutation is done only in the audio thread, it can be properly isolated to only occur between block boundaries. Otherwise, if any thread can mutate the graph at will, it gives no guarantees of consistency in graph processing without the use of locks. Please see the previous link given for Pink's architecture; there is a diagram there that shows how pending adds/removes (graph mutations) are implemented as messages, and how a user from another thread, or event system from within the audio thread, can request mutations safely in a lock-free/wait-free way.

And today, any mutations are done in the audio thread - but they're batched from the main thread. My point was that if you enable graph mutations DURING processing of a block, unpredictable things could happen.

No argument about estimates on time when trying to sync one thread to another. There are techniques like Delay-Locked Loops (see http://kokkinizita.linuxaudio.org/papers/usingdll.pdf) that can help with this. But again, this issue isn't about trying to sync two different threads or clocks, but rather to extend the capabilities of the single, audio-thread. How user application code syncs with the audio engine is, IMO, an orthogonal issue.

I disagree strongly with that characterization. Just because we've enabled JS audio processing in the audio thread does not mean we should arbitrarily extend the capabilities in that thread.

I disagree. Synchronous processing is exactly what is required for consistent, reproducible results, and I think it should be very much considered as a future requirement. I think some clarification is necessary: for a user to create synchronous processing does not require cross-thread synchronicity. The AudioWorker proposal is the prime example of this, as it allows the user to define audio code that runs synchronously with the engine. Only messages are allowed to interact with the worker. The proposal to have AudioWorker extended to allow things like graph modifications, or a new AudioEngineWorker type that would have these capabilities, does not have any requirements for cross-thread synchronization at all. It also does mean that a user would be able to create an event system that works in realtime and non-realtime without change.

I'd like reiterate the point that the Web Audio API proposal for AudioWorker is already proposing user-definable, synchronous processing, at least for audio. Extending what can be done in that processing to include graph modifications, either by extending AudioWorker or implementing a similar but different worker-type, would move towards resolving this issue.

That is absolutely not captured by the title or introduction to this issue. If you want arbitrary graph manipulation during audio processing, then say that. This issue was about dispatching events synchronized with the audio clock. I think the former is a radical rewrite of the Web Audio ideal - and if that's what you want, you should be using more basic access to input/output devices and rolling your own graph system. Just my opinion, of course, but I don't see the requirement for it. The latter, I just don't think "synchronized with the audio clock" is attainable - unless it's just a "queue after done, like onended" style event.

joeberkovitz commented 9 years ago

@kunstmusik Thanks and I think your opinion is noted. Please open a separate issue for further comment as this issue is not the place to continue a back and forth on graph manipulation during audio processing.

joeberkovitz commented 9 years ago

@cwilso thanks for making your viewpoint clearer. I agree #302 captures the requirement for which this is one proposed solution. I will close this issue and note the point in #302.

kunstmusik commented 9 years ago

"The ability to implement event systems completely in sync with an audio engine" - this is just not possible. Events ARE processed separately from the audio thread, and many things are isolated from the audio thread, just like with any other worker - because the web platform already has a threading model, and the DOM (to choose one important thing) is not multi-threaded.

I think we may be crossing wires due to different definitions of an event. I am speaking of events in the musical sense, like "play a note at time x", or "turn off note at time x". This is separate from say, mouse clicks, DOM manipulations, and the like. In systems like Csound and Supercollider, these musical events are reified into an object or data structure of some sort, an EVTBLK in Csound or an OSC message in SuperCollider. Musical events get scheduled and processed by the event system, and events of the "play note at" kind result in graph modifications. So to clarify, I am not proposing in any way to try to run anything at all from the JS Main thread in sync with the audio thread.

And today, any mutations are done in the audio thread - but they're batched from the main thread. My point was that if you enable graph mutations DURING processing of a block, unpredictable things could happen.

Sorry, your first message was a bit ambiguous then, as you mentioned "Enabling graph modification from inside the audio thread would, in my opinion, be a bad idea." I agree completely that no mutations should occur during the processing of a block, but do think mutations should be able to be scheduled from within the audio thread.

No argument about estimates on time when trying to sync one thread to another. There are techniques like Delay-Locked Loops (see http://kokkinizita.linuxaudio.org/papers/usingdll.pdf) that can help with this. But again, this issue isn't about trying to sync two different threads or clocks, but rather to extend the capabilities of the single, audio-thread. How user application code syncs with the audio engine is, IMO, an orthogonal issue.

I disagree strongly with that characterization. Just because we've enabled JS audio processing in the audio thread does not mean we should arbitrarily extend the capabilities in that thread.

Well, I see this kind of arbitrary extension of capabilities as a solution to the primary problem of this issue.

That is absolutely not captured by the title or introduction to this issue. If you want arbitrary graph manipulation during audio processing, then say that. This issue was about dispatching events synchronized with the audio clock. I think the former is a radical rewrite of the Web Audio ideal - and if that's what you want, you should be using more basic access to input/output devices and rolling your own graph system. Just my opinion, of course, but I don't see the requirement for it. The latter, I just don't think "synchronized with the audio clock" is attainable - unless it's just a "queue after done, like onended" style event.

To note, this issue was originally filed in response to an email I posted to the public-audio list, where I commented on event system and timing. I see Arbitrary graph manipulation as one component, in addition to synchronous processing, to allow users to create their own event system, which in turn allows them to "Trigger Events at a specified AudioContext time" with precision. I'm not suggesting any of this to be difficult, or to sneak anything in, and I think this has all been in direct response to the title/introduction of this issue.

Rolling one's own graph system is perfectly fine, but then bars one from using any existing AudioNode's as a part of one's synthesis system. Fair enough, but I think the ideas I've proposed do permit implementing precise event triggering without having to resort to ignoring the rest of what Web Audio provides.