WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 166 forks source link

Alternative design of AudioWorklet? #936

Closed hoch closed 7 years ago

hoch commented 7 years ago

ATTN: @joeberkovitz @rtoy @padenot @mdjp @domenic

First of all, this idea might be too radical to be accepted at this stage since we are wrapping up the review of the current proposal of AudioWorklet.

However, after the days of consideration I have decided to propose the alternative design of AudioWorklet system. This is mainly because many people pointed out the confusion in dual interfaces in its design: AudioWorkletNode and AudioWorkletProcessor.

Here are troubles:

  1. The proposal describes some abstract concepts on how these two components are paired and communicate to each other, but I found it really hard to specify clearly.
  2. The core of AudioWorkletProcessoris actually one method process(). It is wasteful to have dedicated class/subclass only for one function definition.
  3. parameterDescriptors() in AudioWorkletProcessor is wrongly placed. The AudioParam is the main thread interface and it is better to be defined in the main thread object, which is AudioWorkletNode in the proposal.

All in all, I reached the conclusion that we only need one class. See the following code example:

regsiterNode('BitCrusher', class extends AudioWorkletNode {

  static get parameterDescriptors() {
    return [{
      name: 'bitDepth',
      defaultValue: 12,
      minValue: 1
      maxValue: 16 
    }, {
      name: 'frequencyReduction',
      defaultValue: 0.5,
      minValue: 0,
      maxValue: 1
    }];
  }

  constructor (context, options) {  
    // The super constructor creates and initializes AudioParams.
    super(context, 'BitCrusher', options);

    // After the execution of the super constructor, this.bitDepth 
    // and this.frequencyReduction are available.
    console.log(this.frequencyReduction.value < 0.5 ? 'low' : 'hi');    

    this.phase = 0;
    this.lastSampleValue = 0;
    this.customValue = 'bar';    
  }

  // While this method is running, other methods cannot access the properties.
  // (the request has to be queued and executed at the next rendering quantum)        
  process (inputs, outputs, parameters) {
    var input = inputs[0];
    var output = outputs[0];
    var bitDepth = parameters.bitDepth;
    var frequencyReduction = parameters.frequencyReduction;

    for (var channel = 0; channel < output.length; ++channel) { 
      for (var i = 0; i < output[channel].length; ++i) {
        var step = Math.pow(0.5, bitDepth[i]);
        this.phase += frequencyReduction[i];
        if (this.phase >= 1.0) {
          this.phase -= 1.0;
          this.lastSampleValue = 
            step * Math.floor(input[channel][i] / step + 0.5);
        }
        output[channel][i] = this.lastSampleValue;
      }
    }    
  }

  // Note that any access to properties will be deferred while process() is running.
  setCustomValue (value) {
    this.customValue = value;
  }
});

Fortunately, not all is lost. Actually this simple design is the result of the discussion and design consideration we had so far. Here is the benefit of the new pattern:

  1. The unified registration/initialization process.
  2. User can define custom methods as usual.
  3. There is no message passing. Any access to properties outside of process() method is asynchronous, just like how the other AudioNodes operates. The access will be deferred and be performed at the beginning of the next rendering quantum.

The most important change in this idea is the UA must provide the locking mechanism between process() and other function calls. However, this is already what we have in our implementations. That is, the getter, the setter, and the function calls from the main thread are all asynchronous.

Your feedback would be appreciated!

joeberkovitz commented 7 years ago

@hoch I may be missing something but I don't think that locking works as a way of isolating processing on the audio thread from other threads, unless all methods are defined in a protected scope like AudioWorkletGlobalScope. But in that case, how can AudioWorkletNode be the class exposing these methods? AudioNodes live in the global window scope where they are accessible to application code that runs there. It seems strange to have a magical class where this is in one scope and all of its code and its attributes live in another.

Conversely if all of the methods in an AudioWorkletNode except process() were executed in the window global scope, process() would have to be in the window global scope as well to avoid an oddball, non-JS-like class definition. This means that locking would have to isolate process() from all methods in all objects in the global window scope, which seems impractical.

joeberkovitz commented 7 years ago

Let me also respond to your 3 objections to the design at hand (without in any way wanting to pre-empt this discussion).

The proposal describes some abstract concepts on how these two components are paired and communicate to each other, but I found it really hard to specify clearly.

I felt we were almost there and that the TAG critique was limited in nature.

The core of AudioWorkletProcessoris actually one method process(). It is wasteful to have dedicated class/subclass only for one function definition.

I see dedicated interfaces all the time that exist to support a single crucial abstract method.

parameterDescriptors() in AudioWorkletProcessor is wrongly placed. The AudioParam is the main thread interface and it is better to be defined in the main thread object, which is AudioWorkletNode in the proposal.

To me, AudioParam is a main-thread object that reflects a deeper notion of "parameter" which spans both main and audio threads. So I do not find parameterDescriptors() wrongly placed: it describes "parameters", which in turn manifest in both threads. In the main thread as AudioParams and in the audio thread as buffers of parameter values.

hoch commented 7 years ago

Thanks for the respond, @joeberkovitz!

I may be missing something but I don't think that locking works as a way of isolating processing on the audio thread from other threads, unless all methods are defined in a protected scope like AudioWorkletGlobalScope.

Registering a class definition and the instantiation of it are separate procedures. This has been a big design issue for AudioWorklet system and my point was having two interfaces does not simplify how it works as we intended. It just adds another layer of complexity with too much overhead.

But in that case, how can AudioWorkletNode be the class exposing these methods? AudioNodes live in the global window scope where they are accessible to application code that runs there. It seems strange to have a magical class where this is in one scope and all of its code and its attributes live in another.

By design, two threads have access to an AudioNode. Thanks to the locking mechanism, they have the exclusive access to the node at any given moment. Eventually it is a single object. I realized the complete separation of a node into two distinctive objects is not really helpful because 1) the synchronization becomes a burden of JS developers and 2) specifying the relationship between two objects is not easy. (i.e. we do not have any precedence of such design in terms of the Web Standards)

Conversely if all of the methods in an AudioWorkletNode except process() were executed in the window global scope, process() would have to be in the window global scope as well to avoid an oddball, non-JS-like class definition. This means that locking would have to isolate process() from all methods in all objects in the global window scope, which seems impractical.

The scope of lock is confined within a node. When the node is not being processed by the rendering thread, all the properties are accessible by the main thread. The locking mechanism will not lock out other nodes from being modified. This is exactly how our AudioNode behaves.

To me, AudioParam is a main-thread object that reflects a deeper notion of "parameter" which spans both main and audio threads. So I do not find parameterDescriptors() wrongly placed: it describes "parameters", which in turn manifest in both threads. In the main thread as AudioParams and in the audio thread as buffers of parameter values.

The current proposal is already magical. AudioParam is supposed to be a main thread interface, but somehow this is created by AudioWorkletProcessor constructor. So technically this should be named AudioParamProcessor and AudioWorkletNode constructor somehow must infer the main thread objects out of the descriptors. It has been bugging me since day one.

Currently I am going over the CSS Paint API to adapt the spec for our purpose. I brought this confusion to the discussion, so I will propose something reviewable as soon as possible.

joeberkovitz commented 7 years ago

I will try to keep this brief as I'd like to get some other people involved on this thread.

You haven't really answered my issue about locking and scopes. I'd like for you to specifically address this conflict:

  1. If your new unified definition of AudioWorkletNode is instantiated within the AudioWorkerGlobalScope, then this is an object that is not accessible in the window global scope. That contradicts the very nature of an AudioNode which is that it inhabits the window scope.
  2. If, instead, the new unified definition is instantiated within the window global scope, then process() will be able to reference many other objects like console or navigator or anything in the lexical scope within which the registerNode() function is invoked. These references will occur, however, in the audio thread where process() runs. I can't see how your local locking scheme relative to other properties and functions on the node addresses this problem, nor do I understand whether you have some other more global locking scheme in mind.
joeberkovitz commented 7 years ago

[Chair hat on] I think we need to have a fairly ringing endorsement of @hoch's proposal in order to change direction at this point. I believe that a new approach to AudioWorklet at this point probably torches our hopes for V1 by TPAC.

hoch commented 7 years ago

Perhaps my explanation was not good enough. Let me try again:

If your new unified definition of AudioWorkletNode is instantiated within the AudioWorkerGlobalScope,

So there are two different scopes for the definition and the instantiation, and I believe this is the point of confusion. AudioWorkletNode is not instantiated in the AudioWorkletGlobalScope. This special scope is only for registering the definition of AudioWorkletNode. this in the class definition literally means the AudioWorkletNode object itself.

The class definition and the registration only can happen in the AudioWorkletGlobalScope, thus the code within does not have access to console or navigator. I believe the WorkletGlobalScope is specifically designed to have a very limited set of features for that reason.

then this is an object that is not accessible in the window global scope. That contradicts the very nature of an AudioNode which is that it inhabits the window scope.

Please note that the instantiation happens in the main global scope. That is, new AudioWorkletNode(...) only can be executed in the main global scope by design and the instance lives within the scope.

If, instead, the new unified definition is instantiated within the window global scope, then process() will be able to reference many other objects like console or navigator or anything in the lexical scope within which the registerNode() function is invoked.

I believe this is the only shortcoming of the alternative proposal. We might have to say something awkward in the spec to make this work; that is, process() method can only be invoked by the audio rendering thread in an isochronous fashion.

hoch commented 7 years ago

[WG member hat on] :)

I highly doubt that our current PR can be approved by TAG because some of structural issues are quite critical. I completely understand your concern on our schedule (which I do really care), and will do my best to reconcile the existing PR and the alternative design.

joeberkovitz commented 7 years ago

@padenot @rtoy Unfortunately I'm still confused. I would really like our spec editors to contribute to this thread.

@hoch On the one hand you say, "this in the class definition literally means the AudioWorkletNode object itself." OK, so it sounds like the code in the definition must then be in the window global scope, since that's where a node must live. But then, "The class definition and the registration only can happen in the AudioWorkletGlobalScope, thus the code within does not have access to console or navigator". (By the way, your example definition references console.log)

I can make sense of this if I assume that this is a proxy for the node, not the node itself, and that the whole definition lives in the AWGS. But in that case It feels as though we are still defining a Processor -- albeit with additional yet-to-be-described mechanisms for inter-thread communication and locking that would replace the earlier explicit message passing and processing model.

Perhaps I'm over-valuing my own comprehension skills, but if I'm having this much trouble understanding this new approach, is it that much simpler?

Can you identify the critical structural issues that are unable to be approved by the TAG and explain why they cannot be resolved by the current design?

hoch commented 7 years ago

OK, so it sounds like the code in the definition must then be in the window global scope, since that's where a node must live.

The class definition has its own scope. this means the object being constructed in the constructor and its prototypes. From the object's perspective, the difference between the window global scope and AudioWorkletGlobalScope is a set of features that the class definition has access to. That's why AWGS is useful because it offers a limited boundary for the class definition. Once again, where it's defined and where it's instantiated can be different - and this is the unique design trait of AudioWorklet compared to the other Worklet variants.

But in that case It feels as though we are still defining a Processor

From my perspective, the processor belongs to the node. The processor implements the actual audio processing routine, and the node mostly represents the user-programmable surface. For the most part, I guess writing process() method is much less work than composing the exposed surface.

albeit with additional yet-to-be-described mechanisms for inter-thread communication and locking that would replace the earlier explicit message passing and processing model.

The explicit message passing has not been defined for AudioWorklet. We had a slight hope that we could borrow the Worker mechanism, but it was not the case. I think we could go either way - 1) make it as explicit as possible or 2) use the processing model section to mention 'command queue' and 'deferred execution'.

Once again, I am not completely ditching the existing PR. I just want to peek a different path before we set things in stone.

hoch commented 7 years ago

Can you identify the critical structural issues that are unable to be approved by the TAG and explain why they cannot be resolved by the current design?

Here are my concerns, that partially overlap with @domenic's feedback.

  1. What if AudioWorkletNode is not defined while AudioWorkletProcessor is? or vice versa? Do we implicitly create the counterpart? Or we throw if any of setup code is missing?
  2. Where do the class definition and the instance belong?
  3. What is the lifecycle of two associated objects? When are they GC-ed in what order?
  4. We cannot specify process() IDL then how do we capture this in the spec?
  5. The message passing mechanism of Worklet does not exist. How do we define this? Actually if we decide to do the conventional message passing like what Worker does, the 'light-weight' advantage of Worklet is fading away because this just becomes a Worker.
joeberkovitz commented 7 years ago

I will attempt to recap the "answers of record" here -- I'm not trying to be argumentative, just to summarize the state of play as I saw it.

What if AudioWorkletNode is not defined while AudioWorkletProcessor is? or vice versa? Do we implicitly create the counterpart? Or we throw if any of setup code is missing?

We already had a mechanism to throw if a Processor wasn't registered for a Node (although this was missing in some earlier drafts). There is no problem if a Node is not yet instantatied when a Processor is registered.

Where do the class definition and the instance belong?

For a Node, window scope. For a Processor, AWGS.

What is the lifecycle of two associated objects? When are they GC-ed in what order?

This is a sore point of the current spec. We already have a notion of lifetime for nodes based on live references as defined in https://www.w3.org/TR/webaudio/#lifetime-AudioNode. I believe the outstanding challenges here amount to a) defining the notion of tail-time for an AudioWorkletProcessor instance, and b) specifying that at the end of a node's lifetime, the Node and Processor are both eligible for GC. I do not think the order matters or should be relied on.

We cannot specify process() IDL then how do we capture this in the spec?

We can require it to be implemented by UAs as a no-op method and overridden by implementors, in which case it becomes documentable in the IDL.

The message passing mechanism of Worklet does not exist. How do we define this? Actually if we decide to do the conventional message passing like what Worker does, the 'light-weight' advantage of Worklet is fading away because this just becomes a Worker.

It does not become a Worker, because all AWPs share the same rendering thread. That is what makes them lightweight.

domenic commented 7 years ago

We can require it to be implemented by UAs as a no-op method and overridden by implementors, in which case it becomes documentable in the IDL.

Please do not do this. Just omit it from the IDL, like every other worklet-based spec does.

joeberkovitz commented 7 years ago

Sure, sorry -- I am not so familiar with the other specs, so happy to omit from IDL.

In which case, @hoch, it does not sound like a showstopper TAG issue that we do not document process() in the IDL.

hoch commented 7 years ago

@joeberkovitz That's a relief! The consensus from the WG will help me to finish the existing PR as well. I'll chat with @bfgeek to see how to upstream our structural issues to the Worklet spec.

@domenic Please feel free to chime in! As you saw, AudioWorklet has unique problems and we definitely need more opinions from the outside of WG.

joeberkovitz commented 7 years ago

@hoch Great. Does that mean you are ready to close this issue?

Regarding the lifetime issue, I note that this was already raised as #475 regarding AudioWorker and now needs revision for AudioWorklet. I will flag that issue for review separately.

hoch commented 7 years ago

@joeberkovitz Yes. I am closing this and we will continue to purse #869.