WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 167 forks source link

Integration between event-loop of AudioWorkletGlobalScope and rendering loop #2008

Closed gterzian closed 3 years ago

gterzian commented 5 years ago

Describe the issue

Web Audio describes a control-thread, and a rendering-thread, where the control-thread runs a "traditional" event-loop as described in HTML, and the rendering thread is running a custom "rendering loop", described at https://webaudio.github.io/web-audio-api/#rendering-loop.

Step 3 of the rendering loop, "Process a render quantum.", ends-up, in the case of a AudioWorkletNode, calling into the process method of the corresponding AudioWorkletProcessor.

It should be noted that the AudioWorkletProcessor runs in a AudioWorkletGlobalScope, which is a sub-class of WorkletGlobalScope, which should have it's own distinct event-loop as described in https://drafts.css-houdini.org/worklets/#the-event-loop

A further complication is that each AudioWorkletProcessor also has a shipped MessagePort, whose post-message-queue should be treated as a first-class task-queue for the "event-loop" on which it happens to be, in this case the event-loop of the AudioWorkletGlobalScope.

So the problem is that it's not clear how the rendering loop integrates with the event-loop of the AudioWorkletGlobalScope.

The rendering thread has another task queue for microtasks for any microtask operation such as resolution of Promises in the AudioWorkletGlobalScope.

This seems to imply a kind of integration where the rendering loop would run a microtask checkpoint for the AudioWorkletGlobalScope, as is found in Step 4 of the rendering loop.

However that doesn't cover messages received on the port, since those aren't microtasks.

If we take this example:

https://github.com/GoogleChromeLabs/web-audio-samples/blob/master/audio-worklet/design-pattern/shared-buffer/shared-buffer-worklet-processor.js

We see an interplay between the onmessage handler on the port, and the process method, where process essentially does nothing until an "initialization" message has been received on the port.

In practice, how does a UA interleave running tasks on the AudioWorkletGlobalScope, for example when a new message is received on the port, with running the rendering loop which itself calls into process of the processor running in the context of AudioWorkletGlobalScope?

And what about tasks enqueued on the event-loop of AudioWorkletGlobalScope as a result of the call to https://drafts.css-houdini.org/worklets/#dom-worklet-addmodule ?

I can imagine two ways to do it in practice with the current spec:

  1. The task-queues from the event-loop of AudioWorkletGlobalScope could be plugged into the "control-message-queue" of the rendering thread?
  2. run the event-loop of AudioWorkletGlobalScope in parallel to the render-loop, but somehow "stop" it when the render-loop wants to call process?

It appears pretty clear that the goal is not running the event-loop of the AudioWorkletGlobalScope fully in parallel to the render loop, since that would create potential race-conditions between the onmessage handler of a processor and it's process method. Yet I cannot find anything in the spec that integrates both event-loops so as to interleave both sequentially in some way.

A solution could be rewording https://webaudio.github.io/web-audio-api/#rendering-loop, where step 4 would process all task-queues of the event-loop of the AudioWorkletGlobalScope, with microtask checkpoints interleaved?

Where Is It https://webaudio.github.io/web-audio-api/#rendering-thread

Additional Information Could be relevant for worklets in general, see https://github.com/whatwg/html/issues/4213

gterzian commented 5 years ago

Seeing this was discussed in https://github.com/WebAudio/web-audio-api/issues/1511

To me, altough that previous issue was closed, the spec in it's current state is still not clear. I think an improvement was made by describing how microtasks are run, and that still leaves handling of proper tasks from the event-loop of AudioWorkletGlobalScope(for example from incoming message on the MessagePort) as an exercise for the implementer.

From what I understand from the previous thread, it seems important to not interupt Step 3 "Process a render quantum." of the rendering loop, hence the microtask handling is moved to Step 4(versus interleaving them with calls to process).

So we would still have to spec when tasks are handled then.

On another point, I think we can also assume that the actual "audio backend" could be running on another thread, or even another process? I read something(sorry no link at hand) about Chromium moving Audio to a dedicated Mojo service? So that would mean the rendering thread would communicate the "render result" over ipc to the Audio service?

In that case, could it perhaps be realistic to spec the rendering loop in the following way:

  1. Let render result be false.
  2. Process the control message queue.
  3. Optionally, run one, or several, steps of the event-loop of the AudioWorkletGlobalScope. (one step is running one task, and one microtask checkpoint).
  4. Process a render quantum.
  5. Communicate render result to the audio backend
  6. Run a micro-task checkpoint for the event-loop of AudioWorkletGlobalScope.

The reasoning behind this list:

When you get to 5, you communicate with the audio backend. That means the backend will have to processs the result, and then communicate back via a control message to potentially ask for more data.

So while the backend is handling the result, it's a good time to do other work.

The first thing you need to do is perform a microtask checkpoint, to resolve promises resulting from calls inside process. That's sort of necessary since otherwise by the next call to process, the processor will still not have seen the result from the promise, which might change it's state.

Then, you go back to 2, and that's where you either have received a new control message, or not. The important thing is that by putting the task handling at step 3, the UA already knows whether it has received a control message or not, and what the content of that message was, which can help it decide whether to run a step of the event-loop or skip it and rendering another quantum.


Note that this means the processor can't rely on getting a steady flow of messages on it's port, however it doesn't prevent it from sending messages on the port as part of process. Sending a messages doesn't require running the event-loop of the worklet scope, since those should be enqueued on the event-loop of the entangled port of the audio worklet node in the control thread.

So one can wonder what use-case is most important for developpers. Having fast and steady calls to process, with the ability to do non-blocking sends on the port back to the audio worklet node, or having steady messages incoming on the port? I can imagine one could focus on maybe only using incoming messages for some initial setup, and then mostly using the port to send stuff back to the node on the control thread, while focusing on process, and not be sending messages back to the processor while audio is being processed(and off-course a call to close on the audio context would still be guaranteed to be handled as part of the control message queue).


Also, the idea is really that you're not continuously running the event-loop of the worklet global scope, that is something that is explicitally "driven" by step 3 of the rendering loop, precluding any race-condition between an potential onmessage handler and the process call.

Also, it's somewhat not spec compliant, since HTML says that an "An event loop must continually run through the following steps for as long as it exists"(https://html.spec.whatwg.org/multipage/#event-loop-processing-model).

That's maybe a good point in the HTML spec to add, "unless it is a worklet-event-loop, in which case the UA can decide on per-case basis when to run steps of the event-loop(cc @annevk re https://github.com/whatwg/html/issues/4213).

padenot commented 5 years ago

In general, I agree with your two messages, this is something we're missing.

From what I understand from the previous thread, it seems important to not interupt Step 3 "Process a render quantum." of the rendering loop, hence the microtask handling is moved to Step 4(versus interleaving them with calls to process). So we would still have to spec when tasks are handled then.

On another point, I think we can also assume that the actual "audio backend" could be running on another thread, or even another process? I read something(sorry no link at hand) about Chromium moving Audio to a dedicated Mojo service? So that would mean the rendering thread would communicate the "render result" over ipc to the Audio service?

The audio system code is generally running on another process in modern browsers, yes (at least Firefox, for now just on Linux but this will change soon, and Chrome everywhere I believe, even before Mojo). It's never running on the same thread as something else, and it's often at higher priority.

So one can wonder what use-case is most important for developpers. Having fast and steady calls to process, with the ability to do non-blocking sends on the port back to the audio worklet node, or having steady messages incoming on the port? I can imagine one could focus on maybe only using incoming messages for some initial setup, and then mostly using the port to send stuff back to the node on the control thread, while focusing on process, and not be sending messages back to the processor while audio is being processed(and off-course a call to close on the audio context would still be guaranteed to be handled as part of the control message queue).

Regarding the sentence I put in bold: it's preferable to do the former, otherwise it would break the fundamental rule of audio programming: there would be a risk of delaying the audio callback (see http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing for a more in depth explanation).

Things should be defined so it's clear, but a program that uses Promises or uses a lot of events as part of its audio processing is buggy. It's possible because they cannot be disabled. https://github.com/tc39/ecma262/issues/1120 has the background for this. The right thing to do is to use a SharedArrayBuffer for communication, and never ever create objects on this rendering thread.

The list of steps should be roughly:

  1. get an event from the system that more audio should be played out
  2. process the control messages
  3. process the events, by getting all the events currently in the queue, and executing them as usual, with microtask checkpoints in between the events. The events that are posted during the rest of the steps are process next time this algorithm runs
  4. do the audio processing, including calling process on the worklets
  5. do a microtask checkpoint
  6. return

We could even spec this as a normal event loop, if 2 and 4 are events, and 1 is an event that synchronously waits for a signal from the implementation that there is more audio to render.

karlt commented 5 years ago

There would be value in allowing the implementation to process control messages and events while waiting for a signal from the system that more audio should be processed.

gterzian commented 5 years ago

break the fundamental rule of audio programming: there would be a risk of delaying the audio callback (see http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing for a more in depth explanation).

Thanks, very interesting article and I think a good foundation on which a UA could make potential decisions with regards to prioritizing things.

Things should be defined so it's clear, but a program that uses Promises or uses a lot of events as part of its audio processing is buggy. It's possible because they cannot be disabled.

In the context of the HTML processing model, a UA can actually throttle tasks of a given task-queue, or prioritize one task-queue over all others, as long as within a given task-queue, ordering is preserved.

However, with promises it's a bit different, since they resolve within a microtask, and because in the current HTML semantics, if you run a task, you must run a microtask-checkpoint after it.

So the question becomes: semantically, what is a call to process of a audio worklet processor? Is it a task or not? The fact that it runs in the context of a global scope would seem to imply it is.

And if it's a task, could you spec a specific "worklet event-loop" that would not perform microtask checkpoints after running each task? In theory it could be possible to queue all micro tasks on a dedicated task-queue, and run them as regular tasks, and run all tasks without a dedicated "microtask checkpoint". That could then be a way to "ignore" micro-tasks, and fully prioritize a potential "rendering" task-queue(where a task would call process).

Also, in practice you wouldn't have to queue a task for the rendering, or introduce a "rendering" task-queue, it would be purely a way to express the computation.


The list of steps should be roughly:
1. get an event from the system that more audio should be played out

2. process the control messages

3. process the events, by getting all the events currently in the queue, and executing them as usual, with microtask checkpoints in between the events. The events that are posted during the rest of the steps are process next time this algorithm runs

4. do the audio processing, including calling `process` on the worklets

5. do a microtask checkpoint

6. return

It could be more robust to spec this as a formal event-loop simply running one task at a time, with the various steps above being instead expressed via different task-queues that the UA could prioritize.

Something like:


There would be value in allowing the implementation to process control messages and events while waiting for a signal from the system that more audio should be processed.

You could express that via giving full flexibility to UA in prioritizing task-sources, as opposed to via an imperative list of steps on the rendering loop.

In practice, it could result in a sequence like:

  1. You start a new iteration of the rendering loop.

  2. The "render-a-quantum-task-source" is empty.

  3. The "control-task-source" contains a task.

  4. The "port-message-queue" contains a task too.

  5. For good measure, the "microtask-queue" contains several tasks.

  6. You select the task from the "control-task-source", it's a task from the backend asking for a render quantum. That queues a task on the "render-a-quantum-task-source".

  7. You start another iteration of the loop.

  8. The "render-a-quantum-task-source" contains a task(and so does the "port-message-queue" and the "microtask-queue", but you can throttle them).

  9. You select the task from the "render-a-quantum-task-source" and run the "processing a render-quantum steps".

  10. You start another iteration of the loop.

  11. The "control-task-source" is empty, so you select to run a task from the "micro-task queue", or the "port-message-queue" of a given processor.

Now off-course in practice you could just immediately run the "process a rander quantum" steps when receiving the control message from the backend, you wouldn't have to queue a task and then select on the various tasks-queues.

Semantically, what would allow to "immediately run the "process a render quantum" steps when receiving the control message from the backend" is the fact that the control message would be a task from the "control-task-source", and the handling of that task itself would queue a task on the "render-a-quantum-task-source", and since you could fully prioritize the "render-a-quantum-task-source"(and you know it's currently empty since the only way to queue a task on it is via handling of a task from the "control-task-source"), you could immediately process a render quantum.

In practice you would effectively do what the spec currently says, without queuing additional tasks, but the spec language would be more robust to differences in implementations. Currently it reads a bit like you'd have to change the spec if you decided to re-order some steps to get better performance.


And the current processing model actually comes with a lot of flexibility already:

Let taskQueue be one of the event loop's task queues, chosen in a user-agent-defined manner https://html.spec.whatwg.org/multipage/webappapis.html#event-loop-processing-model

I think what worklet needs is just a change at Step 8 that would read "if this is not a worklet event-loop, Perform a microtask checkpoint."(in the case of a worklet event-loop, the microtask queue is treated like any other task-queue)

Similar to how Step 11 reads: "Update the rendering: if this is a window event loop, then:"

gterzian commented 5 years ago

We could even spec this as a normal event loop, if 2 and 4 are events, and 1 is an event that synchronously waits for a signal from the implementation that there is more audio to render.

Yes I agree, and note again the problem that speccing something as a task, means that in theory it should be followed by a microtask checkpoint. Hence the need, I think, to introduce some flexibility for a dedicated "worklet event-loop" to not run microtask checkpoints after every task, instead treating it as just another task-queue.

Re "1 is an event that synchronously waits for a signal from the implementation that there is more audio to render", you might want to merge that into 2, since something like a close control message coming from the audio context should probably be handled on par with the system asking for more data.

gterzian commented 5 years ago

By the way, sorry for writing looong posts on this, hoping it's still readable, and mainly trying to bounce off some ideas.

On second thoughts, I came to realize my suggestion above would turn the "rendering loop" basically into the event-loop of the worklet global-scope, which might be taking things a bit too far in one direction.

It might actually make sense to spec a "rendering loop" that isn't using HTML event-loop semantics, and rather reflects that realities of dealing with real-time audio(like is done now).

The problem of the current spec is that it doesn't say when you should run the "tasks"(not the microtasks, which are mentioned) or the worklet global scope, for example for incoming messages on the message port of a processor.

So a more "direct" solution to that problem, versus re-writing the entire loop, is what I initially proposed in my first post: step 4 could process all task-queues of the event-loop of the AudioWorkletGlobalScope, with microtask checkpoints interleaved? (as opposed to only run microtasks, as is specced now).

And then we'd have to accept that calling process of a worklet processor isn't really a "task", although the JS running in it can enqueue (micro)tasks, and that multiple calls to different processors in the same globalscope, as part of one rendering quantum, will not be interleaved with microtask checkpoints.

It would probably be important for Step 4 to first do a microtask checkpoint, handling any microtasks enqueued as part of calls to process, and then handle tasks on the various queues of the worklet global-scope, and interleaving each task with a micro-task checkpoint.

That could turn into quite a big "step 4", and it would be the responsibility of the developer to ensure it doesn't fill up with too many tasks...

padenot commented 5 years ago

F2F summary:

This means that onmessages will be executed first, with microtasks checkpoint in between those. After this, the audio rendering task will be be executed, calling the process methods. After this, a microtask checkpoint is performed.

gterzian commented 5 years ago

This means that onmessages will be executed first, with microtasks checkpoint in between those. After this, the audio rendering task will be be executed, calling the process methods. After this, a microtask checkpoint is performed.

Have you considered defining a new "render-an-audio-quantum" task-source? Then it could be said that the event-loop on the rendering thread has three task-sources(I think):

  1. A port-message-queue
  2. A control-message-queue
  3. A "render-an-audio-quantum", or similarly named, task-source.

Then the UA could pick a runnable task from any task-queue at each iteration of the loop, and each task will be followed by a microtask checkpoint, without having to define anything special in the audio spec.


It also reads like the control-message-queue is defined as a special purpose shared queue between the control and the rendering thread, and that definition could perhaps be replaced with a task-source of the rendering thread(and perhaps also the control thread, if two-way communication is required), since a task-source gives you all the atomic guarantees you need, I think.

Also the way Step 2, Process the control message queue, is defined, seems to specify that the rendering loop must handle all messages that are currently enqueued, which is somewhat not compliant with the concept of "running a normal event-loop", since that requires allowing the UA to choose one runnable task to run at each iteration, so long as ordering per task-source is preserved.

In practice, if the post-message-queue was defined using a task-source, a UA could still prioritize the control message queue until it was empty, and you wouldn't have to define this behavior specifically in the audio spec(although if it's really important to do so, you could add a note to that effect).


The audio callback is also queued as a task in the control message queue.

And this statement, for example, could be replaced by queuing a task using the "render-an-audio-quantum" task-source from the audio callback.

jackschaedler commented 4 years ago

I don't think this reproduction scenario adds anything new to the conversation, but here it is for posterity. It's maybe helpful since it describes a concrete way that this issue affects developers out in the wild: https://jackschaedler.github.io/offline-audio-worklet-repro/