WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 167 forks source link

[audioworklet] Permit process() calls to be skipped on inactive AudioWorkletProcessors with silent input #1453

Closed karlt closed 5 years ago

karlt commented 6 years ago

An optimization that was found to be important in use cases where one-shot AudioNodes were frequently created was to skip processing when the input was known to be silent. A similar optimization would be valuable for AudioWorkletProcessors.

In active documents with many objects, it is not practical to perform full garbage collection frequently, and so there is often a long delay before unreferenced inactive nodes are collected. Silent input, however, can be determined promptly.

Prompt detection of unnecessary processing is very important in graphs with many one-shot nodes such as at http://webaudiodemos.appspot.com/MIDIDrums/index.html

Skipping process() is also important in the long term for releasing the resources associated with the AudioWorkletProcessors. While process() calls are still required, the node and processor are observable and so cannot be garbage collected.

If the process() call can be skipped when input is silent, then clients need not explicitly disconnect all AudioWorklet inputs before removing their last references to the nodes.

karlt commented 6 years ago

I don't have a strong preference re whether implementations MUST or MAY skip process() calls when input is silent (and the active source flag is false). It is easy to test whether inputs are all silent. OTOH AudioWorkletProcessors need to be able to process at least one call with silent input because the active source flag is initially true (with good reason), and so there is probably little harm in a few unnecessary calls. In general though a determistic solution is preferable where practical, and it seems to be practical here.

hoch commented 6 years ago

I agree with the idea, but this might fall into the implementation detail side. The interesting part is users can observe the implementation difference if some browsers skip calling process() for the optimization.

joeberkovitz commented 6 years ago

I can't see how this optimization is feasible in the general case, given that any processor could add its own generated signal to the input as part of processing, even when the input is silent.

So this feels like an optional optimization that a developer would have to request. As such I would put this into V.next since we would need to design a mechanism for such optional behaviors.

rtoy commented 6 years ago

I agree with @joeberkovitz. Silent inputs does not imply process can be skipped. In fact, I think I might go farther and say we should never skip calls to process. However, I am ok with adding hints to let process know if the inputs are silent. This would allow process to do something intelligent, without having to check for itself if the inputs are silent.

karlt commented 6 years ago

Thank you for considering this and taking the time to comment.

Yes, the optimization is not feasible with the current spec, but this issue is about changing the spec to make it feasible, which is very doable.

This is important to give AudioWorkletNode status similar to that of built-in nodes. It also avoids a problem where releasing the last reference to any AudioNodes connected to inputs leaves the processor and AudioWorkletNode in a state where they can never be garbage collected until the AudioContext is closed (or suspended and unreferenced).

This proposal is to change the second of the two active processing conditions from

  1. There are one or more connected inputs to the AudioWorkletNode.

to

  1. Any channel of any input buffer has non-zero audio samples.

any processor could add its own generated signal to the input as part of processing, even when the input is silent.

Yes, in the same way that any processor could add its own generated signal as part of processing, even when no inputs are connected to the AudioWorkletNode.

So this feels like an optional optimization that a developer would have to request.

Yes, the developer would need to indicate whether subsequent invocations of process() occur or not.

we would need to design a mechanism for such optional behaviors.

That mechanism of indication already exists through the active source flag.

With the proposed change to the active processing conditions, the processor would return true if it wishes to add signal to silent connected input.

I assume that, with this change, there would be limited value in having an additional flag to indicate whether subsequent invocations of process() occur when there are no inputs connected. i.e. one flag is better than two when one is sufficient.

I think I might go farther and say we should never skip calls to process.

If we were never to skip calls to process(), then the processor could never be garbage collected.

I am ok with adding hints to let process know if the inputs are silent. This would allow process to do something intelligent, without having to check for itself if the inputs are silent.

Such hints wouldn't address the issues here. They provide neither the conditions necessary to garbage collect the processor nor sufficient optimization, as process() would still need to be called.

joeberkovitz commented 6 years ago

@karlt I reached a similar conclusion after thinking further, but I would put it differently (and my thinking didn't necessarily make me feel more favorable about doing this yet... but I'll explain).

First of all, I do agree that the way that AudioWorkletNode behaves with respect to silent inputs should mirror the way that other nodes behave. And if we are going to make a change in this respect for AWNs, then we need to do more than merely adjust the statements about how AWNs work: we need to make some global, spec-wide statements about how silent inputs work with respect to all node types.

But that gets tricky, fast. I don't believe we are in a position right now to safely state (globally) that silent inputs behave like disconnected inputs across the board, everywhere, for all nodes. Would you say that a DelayNode or BiquadFilterNode whose input goes silent, is eligible for GC after the tail-time has elapsed? That doesn't sound right: the input might become non-silent again. Treating silent nodes the same as disconnected ones might also have observable and unexpected effects in terms of channel up/down mixing; even a silent node could have a higher-channel format than its mixed-in companions.

I'm not saying this idea of treating silent inputs specially is wrong; just that it is more complicated than a patch to AWN behavior and deserves careful, global thinking.

joeberkovitz commented 6 years ago

The WG discussed this today. We get the case for optimization, but we don't have a way for a node to assert that it's OK for processing to be skipped on silent input. That's the new element that we'd prefer to come back to with a careful discussion, potentially considering some hinting mechanism. It's not a mechanism that we can just make happen by redefining the active processing conditions, because that would skip process() in cases where the AWN might still generate its own signal in spite of silent input.

karlt commented 6 years ago

[...] my thinking didn't necessarily make me feel more favorable about doing this yet... but I'll explain).

This explanation is very helpful thanks, as I can now see where I have not explained the garbage collection needs sufficiently.

First of all, I do agree that the way that AudioWorkletNode behaves with respect to silent inputs should mirror the way that other nodes behave. And if we are going to make a change in this respect for AWNs, then we need to do more than merely adjust the statements about how AWNs work: we need to make some global, spec-wide statements about how silent inputs work with respect to all node types.

We don't need to change the way built-in nodes behave wrt silent input. AFAI am aware, this is adequately spec'd and in a way that allows implementations to garbage collect resources (but see below for what I mean here).

But that gets tricky, fast. I don't believe we are in a position right now to safely state (globally) that silent inputs behave like disconnected inputs across the board, everywhere, for all nodes.

Yes, silent inputs do not in general behave like disconnected inputs.

Would you say that a DelayNode or BiquadFilterNode whose input goes silent, is eligible for GC after the tail-time has elapsed? That doesn't sound right: the input might become non-silent again.

If the input of a DelayNode or BiquadFilterNode goes silent AND there are no external references to the node AND there are no external references to any of the upstream nodes, then the node is essentially eligible for GC (but see below). In these circumstances, the input cannot become non-silent again because nothing can change it.

This is a necessary GC for implementations with resource limits (real world implementations), and the one-shot nature of many of the active source nodes makes this more obvious. Once the client has released its last reference to a node, there is no way for the client to disconnect, and so it is up to the implementation to collect unnecessary resources.

Treating silent nodes the same as disconnected ones might also have observable and unexpected effects in terms of channel up/down mixing; even a silent node could have a higher-channel format than its mixed-in companions.

You are correct. Silent nodes are not the same as disconnected nodes in terms of their effects, but they are similar in terms of resources required for these effects if there are no external references.

When a source node becomes permanently silent, but remains connected to downstream nodes, an implementation may cease processing on this node if it maintains the same effects on its connected output nodes. For silent output, the effects on output nodes involve their input channel count (and associated mixing) and input connection status.

If there are no external references to the source node, then the implementation is able to record the (now static) effects of the source node on the output nodes, and proceed to garbage collect the resources associated with the source node.

The implementation is able to do this because there is no observable difference in behavior. This is what garbage collection is. "Garbage" describes the resources allocated for purposes that can no longer be observed.

The same process can continue for any now permanently silent output node that has no external references, and similarly downstream. The situation is more complex for cycles because in general garbage collection requires that there are no external references to any node in the cycle.

None of this need be explicitly spec'd because it does not change any observable behavior. It may still be helpful to document this, however, to guide future changes. If OTOH implementations were not able to collect this garbage (due to some new observable behavior such as process() calls in AudioWorklet), then it would be most helpful for the spec to strongly recommend that clients keep track of resources and disconnect all nodes as soon as they are no longer required. IMO an API that required the client to track and explicitly disconnect passive nodes would be much less favorable.

karlt commented 6 years ago

The WG discussed this today. We get the case for optimization, but we don't have a way for a node to assert that it's OK for processing to be skipped on silent input. That's the new element that we'd prefer to come back to with a careful discussion, potentially considering some hinting mechanism. It's not a mechanism that we can just make happen by redefining the active processing conditions, because that would skip process() in cases where the AWN might still generate its own signal in spite of silent input.

I appreciate the discussion and feedback, but I'm not grasping the problem that the WG sees.

Would someone from the WG be able to explain for me, please, why it is OK to require that process() return true if the processor wishes to generate its own signal in spite of no connected inputs, but it would not be OK to require that process() return true if the processor wishes to generate its own signal in spite of silent connected input?

joeberkovitz commented 6 years ago

Hi @karlt -- thanks for the additional comments. I'll try to clarify our take on this.

Why it is OK to require that process() return true if the processor wishes to generate its own signal in spite of no connected inputs, but it would not be OK to require that process() return true if the processor wishes to generate its own signal in spite of silent connected input?

The reason is that there is no rigorous definition of "silent connected input", as distinct from "no connected inputs" -- and we don't currently see that such a definition is needed. You said:

When a source node becomes permanently silent, but remains connected to downstream nodes, an implementation may cease processing on this node if it maintains the same effects on its connected output nodes. For silent output, the effects on output nodes involve their input channel count (and associated mixing) and input connection status.

Our position at the moment is that this case is already handled, because once a source node becomes permanently silent, its internal playing reference (see https://webaudio.github.io/web-audio-api/#lifetime-AudioNode) goes away. Assuming that no normal JS references exist to the node (and none should), this enables the now-silent node to be GCed, and for its connections to downstream node to go away as per the text in that same section:

When an AudioNode has no references it will be deleted. Before it is deleted, it will disconnect itself from any other AudioNodes which it is connected to. In this way it releases all connection references it has to other nodes.

Thus, a "permanently silent" node is supposed to become disconnected from downstream nodes, eliminating their incoming connections. In effect this reduces the problem of detecting permanently silent nodes to the existing problem of detecting a lack of connected inputs... ergo, a downstream AudioWorkletNode will eventually find that it has no connected inputs, and can return true or false from process().

Hope all that made sense to you. Absent a flaw in this logic, and you're welcome to point out why you feel this doesn't work, we don't yet see a reason to change the spec.

karlt commented 6 years ago

I've attempted to describe the problem with the logic in https://github.com/WebAudio/web-audio-api/issues/1471. I expect this part of this issue would become clearer when that is resolved. In summary, if deleting the upstream node would have observable effects, then the implementation must not remove the last reference to the upstream node. Stopping process calls on an AudioWorkletProcessor would be an observable effect.

Besides the need to support proper garbage collection, there is also a need for optimization of silent input before calling into JS if support for AudioWorklet use cases similar to existing nodes are intended.

joeberkovitz commented 6 years ago

Thanks @karlt - I think that #1471 does explain a different problem that is indeed for real (and that stopping process() calls would be observable along with changes in mixing).

It does feels like that issue, as you said, is a better starting point. It highlights a conflict here between a (possibly naive) attempt to optimize graph behavior, and the hiding of GC effects.

joeberkovitz commented 6 years ago

@karlt and others: Let me just ask though: how can an AWP detect the absence of calls to process()? My first thought was, yeah, this is a problem, but on further reflection I'm having trouble seeing how it could find out that it wasn't being called.

Granted, the effects of GC on channel mixing are still a concern.

karlt commented 6 years ago

It is more the observable difference in behavior that is important.

However, the lack of an expected occurrence can be detected reasonably easily at the point of a subsequently expected event. Consider a theoretical situation with two AWPs each of which sends a message through the MessageChannel with currentTime on each process() call. If the main thread receives a message with greater currentTime from one AWP than expected on the other, then the lack of a process() call on the second AWP has been detected. (This situation assuming the associated node has a listener on the MessagePort, in which case the node will not be garbage collected. Even if the worklet node is not referenced from main thread script, the AudioWorkletGlobalScope still makes communication to/from the AWP possible.)

For a more common real world scenario where the observable difference in behavior is relevant, consider an AWP that transforms its input and intends to use a short tail-time. The guidance in the spec appropriately indicates that process() should return true. But here, consider an AWP than never returns true. Usually the node and processor may not be garbage collected immediately when the last JS reference was cleared, and so process() would be called often enough to support the tail-time. However, in some implementations, the garbage collection may sometimes happen quickly enough that process() is not called often enough to support the tail-time. The result is an AWP that usually "works" but sometimes its output is clipped.

rtoy commented 6 years ago

See #1480. This allows the worklet to decide what to do if the inputs are silent. But process() will always be called because silent input does not imply silent output.

rtoy commented 6 years ago

Teleconf: Look more closely at the implications for the next teleconf

karlt commented 6 years ago

With the current plans for https://github.com/WebAudio/web-audio-api/issues/1471 the proposal here may be adjusted a little:

Change the second of the two active processing conditions from

  1. There are one or more connected inputs to the AudioWorkletNode.

to

  1. There are one or more active connected inputs to the AudioWorkletNode.

where an "active" connected input is one where the connected node is not known to be silent. "known to be silent" would be described consistently with the criteria selected in https://github.com/WebAudio/web-audio-api/issues/1471 to determine when other node types would output a single channel of silence (regardless of the usual output channel count).

Presumably the content of inputs[n] would be an empty array in similar situations of no active connected inputs.

hoch commented 6 years ago

@karlt I am still not sure how the proposal is relevant to the "skippable" process() method. I have some follow-ups for recent comments above.

However, the lack of an expected occurrence can be detected reasonably easily at the point of a subsequently expected event.

This is from here. If this is the case, doesn't Worker or MessageChannel have the same problem? Is indirect detection of GC really a problem? IMO this is not a AWP's problem per se, rather it is a problem of MessagePort in the AWP. Any feature that utilizes MessagePort for cross-thread communication will have the similar issue.

There are one or more active connected inputs to the AudioWorkletNode.

I think this new condition is to make GC non-observable. Right? This is basically to hide "disconnection" from AWP's view point. So that's good, but I am not sure how this helps the cause of skippable process() method. Could you elaborate?

This issue seems to be closely related to #1471, so I will hold until we have a PR merged.

karlt commented 6 years ago

@karlt I am still not sure how the proposal is relevant to the "skippable" process() method.

The proposal provides that process() need not be called when there is nothing to process. i.e. when inputs are not active.

This is from here. If this is the case, doesn't Worker or MessageChannel have the same problem? Is indirect detection of GC really a problem? IMO this is not a AWP's problem per se, rather it is a problem of MessagePort in the AWP. Any feature that utilizes MessagePort for cross-thread communication will have the similar issue.

There is a problem if GC changes the behavior of an observable object.

MessagePort was used in the example because it is the way that the worklet execution thread can communicate with a document. Worker uses MessagePort for a similar purpose, but that doesn't mean that Worklet has a similar problem. MessagePort is merely the means for detecting premature GC. Provided there is no premature GC, there is no problem.

There are one or more active connected inputs to the AudioWorkletNode.

I think this new condition is to make GC non-observable. Right? This is basically to hide "disconnection" from AWP's view point.

It depends on your perspective.

Assuming GC is not observable and disconnection occurs only through disconnect() calls, the new condition will permit the AWP to be collected when the implementation knows that there will not be any future active connections.

If you are assuming that GC can cause observable disconnections, then that is a problem. From that perspective, the change to the active processing condition fixes the problem by providing a similar transition to inactive but at a well-defined time, instead of when GC happens.

So that's good, but I am not sure how this helps the cause of skippable process() method. Could you elaborate?

If we don't adjust the condition, then AWP will need to continue calling process() even when we know that there is nothing to process. This is inefficient and prevents the implementation from collecting AWNs in this state.

This issue seems to be closely related to #1471, so I will hold until we have a PR merged.

The precise resolution of this issue is likely to depend on the resolution of https://github.com/WebAudio/web-audio-api/issues/1471 and so waiting for the precise resolution of https://github.com/WebAudio/web-audio-api/issues/1471 before preparation of wording for this issue is sensible.

padenot commented 5 years ago

(I forgot to write down the resolution here during TPAC)

TPAC resolution: spec the behaviour @karlt describes in https://github.com/WebAudio/web-audio-api/issues/1453#issuecomment-419280140.

hoch commented 5 years ago

This issue relies on the merge of #1810.