🎉PoC: Audio management through the WebAudio API

germain-gg commented 6 years ago

Disclaimer, this is not ready for production. This is a proof of concepts to integrate the WebAudio API in VideoContext. I want to use this as a conversation starter. I know the tests are failing, i have committed using --no-verify. It's bad, I know :P but I didn't wanted to spend more time on this until I know the structure we'll be going with. I also took the liberty to clean up some of the codebase (commented code, or old comments, ...)

Audio management through the WebAudio API

VideoContext follows the WebAudio API pattern but only connect nodes that are drawable. There is basically no audio management. Everything is outputted from the MediaNodes. This works pretty well, however there are no reasons why an effect node should be only for drawable nodes. What if I want to have an audio cross-fade during a transition between two video nodes?

The idea here is that every item that extends GraphNode get a new _audioNode property. I then hooked into the renderGraph#registerConnection function to connect those audio nodes to each other.

For media elements (audio/video) it is quite simple, we can create a MediaElementAudioSourceNode because they already have a of audio tracks.

In the case of nodes that inherits from ProcessingNode it is a bit less straight forward. Every node now has to be connected in that Web Audio tree. Even the nodes that have no concepts of audio at the moment. I had to create a GainNode with value=1 so that they can act as a passthrough.

Here is are two screenshots below of the VideoContext node tree and the AudioContext tree

Effect / Transition definition

There are currently no way of adding audio effects. This PR is purely here to connect all the elements together.

This library currently brings a way to describe effect or transition definition

{
    title: "Cross-Fade",
    description: "A cross-fade effect. Typically used as a transistion.",
    vertexShader,
    fragmentShader,
    properties: {
        mix: { type: "uniform", value: 0.0 }
    },
    inputs: ["u_image_a", "u_image_b"]
}

We might need to restructure this to incorporate the concept of audio in here. Of course an effect node could now only affect audio, video or both. It could look something like

{
  title,
  description,
  drawable {
    vertexShader,
    ...
  },
  hearable {
    ...
  },
  inputs
}

The reason why I kept inputs on the top level of this JSON element is because the Web Audio API has the same concept of maximum number of inputs

The object hearable could contain could take two properties:

An audio node factory to override the GainNode (you can see a list of available nodes here)
A properties object or function that will compute the values being passed to the audio node

Feedback

Please let me know your thoughts.

I am very keen to hear about two things:

I'd love to hear everyone's thoughts regarding the structure of what I built here. Once we reach a consensus regarding the design I'll tackle all the unit tests and consolidate the implementation
The audio effect definition. I thing this will be a key feature of a library and will definitely improve the usability of it. However I probably missed a ton of use cases and would like to hear a bit more about what everyone would implement with this to see if my drawable/hearable approach is still valid or not

germain-gg commented 6 years ago

I just updated the PR to add support for custom audio nodes when using ProcessingNode.

This currently solves only the VideoContext#EffectNode use case and not the VideoContext#TransitionNode.

I had to introduce the concept of inputAudioNode and outputAudioNode. Some effect can be very simple like a low-pass filter and require only one BiquadFilterNode when other more complex effects like a Wah-Wah effect might require a BiquadFilterNode chained to a ConvolverNode and a WaveShaperNode

This concept is only used in renderGraph when we register or unregister a connection between two VideoContext nodes.

sourceNode.outputAudioNode.connect(destinationNode.inputAudioNode);

Below is an example of a simple audio effect definition that only contains one node.

const canvas = document.createElement("canvas");
const vc = new VideoContext(canvas);

const videoNode = vc.video(source);

const definition = {
  title: "foo",
  ...,
  hearable: {
    audioNodesFactory: (audioCtx) => {
        const biquadFilter = audioCtx.createBiquadFilter();
        biquadFilter.type = "lowpass";
        return {
            input: biquadFilter,
            output: biquadFilter,
        };
    }
  }
}

and here is one a bit more complex

const definition = {
  title: "foo",
  ...,
  hearable: {
    audioNodesFactory: (audioCtx) => {
        const analyser = audioCtx.createAnalyser();
        const distortion = audioCtx.createWaveShaper();
        const gainNode = audioCtx.createGain();
        const convolver = audioCtx.createConvolver();
        const biquadFilter = audioCtx.createBiquadFilter();

        biquadFilter.type = "lowshelf";

        // connect the nodes together

        analyser.connect(distortion);
        distortion.connect(biquadFilter);
        biquadFilter.connect(convolver);
        convolver.connect(gainNode);

        return {
            input: analyser,
            output: gainNode,
        };
    }
  }
}

Again, here nothing is set in stone. I'm keeping an open mind regarding naming or how we can define effect in the most efficient and readable way.

Now working to find a way to make audio work with a transition node. This brings up couple of interesting challenges

germain-gg commented 6 years ago

I added support for linear audio fades.

This is leveraging the ChannelMergerNode.

This adds a few interesting thing to the equation.

For the following code

var videoNode1 = vc.video("../../assets/introductions-rant.mp4", 15);
videoNode1.start(0);
videoNode1.stop(6);

var videoNode2 = vc.video("../../assets/introductions-rant.mp4", 15);
videoNode2.start(4);
videoNode2.stop(10);

var crossFade = vc.transition(VideoContext.DEFINITIONS.CROSSFADE);
crossFade.transition(4,6,0.0,1.0,"mix");

videoNode1.connect(crossFade);
videoNode2.connect(crossFade);

crossFade.connect(vc.destination);

we end up with the following graph

We have 2 video nodes, 1 transition node and 1 destination node. However in the representation above you will notice two GainNode right after the MediaElementAudioSource. They are here that we can call setValueAtTime on the AudioParam object

All of that happens in the _update function in the TransitionNode. That function was originally in charge of interpolation two values of a transition and pass that down to the shader program.

The audio fade only support linear interpolation for now as this is the only mixing available in VideoContext currently. We could extend that in the future.

germain-gg commented 6 years ago

In light of the new Chrome WebAudio autoplay policy that will be rolled out in December 2018 https://developers.google.com/web/updates/2018/11/web-audio-autoplay

We will have to add some sort of mechanism in the library to resume audio context playback after the first user interaction. This is pretty much the same type of problem as the mobile video playback

PTaylour commented 6 years ago

I know the tests are failing, i have committed using --no-verify. It's bad, I know :P but I didn't wanted to spend more time on this until I know the structure we'll be going with.

Now that CI is set up running tests on the commit hook is a little OTT. We could get rid of that

PTaylour commented 6 years ago

I also took the liberty to clean up some of the codebase (commented code, or old comments, ...)

always appreciated :)

PTaylour commented 6 years ago

I've had a quick read—looking good!

I plan to spend some time looking at it properly tomorrow. In the mean time I'm sharing this with a few people internally who may have use cases / opinions.

Thanks!

PTaylour commented 5 years ago

Brain dump to follow...

Your Qs

I am very keen to hear about two things:

I'd love to hear everyone's thoughts regarding the structure of what I built here. Once we reach a consensus regarding the design I'll tackle all the unit tests and consolidate the implementation

Do you mean structure from a code point of view?

The audio effect definition. I thing this will be a key feature of a library and will definitely improve the usability of it. However I probably missed a ton of use cases and would like to hear a bit more about what everyone would implement with this to see if my drawable/hearable approach is still valid or not

I'm keeping an open mind regarding naming or how we can define effect in the most efficient and readable way.

I like the effect definition. audioCtx -> { input, output } ticks the efficient and readable boxes for me. And sitting hearable within the effect definitions makes a lot of sense.

Transition and Compositing Nodes are an interesting one. I guess it makes sense to mirror the videocontext stuff with how much you "invert the control" in _update. For transitions the definition describe the property eg "mix" and _update just updates the value. Potentially the transition logic (eg the gain nodes) should be part hoisted up into the definition?

hearable: {
    audioNodesFactory: (audioCtx) => {
       ...
       const gainNode = ...

       return {
            input: ...,
            output: ...,
            update: ({ mix }) => {
               gainNode.setValueAtTime({ gain: mix ... })
         }
     }
  }

More trouble than it's worth?

Audio context stuff

This is leveraging the ChannelMergerNode.

Is this what ChannelMerger is for? I thought it was for assigning mono inputs to different channels in a n-channel output. You could sum the n input gain nodes by just connecting them all to whatever you connect downstream?

---GainInputA ----|
                  |---[audioNodesFactory() = GainNodeC]----> 
---GainInputB ----|

Questions I'm asking myself

should library handle all graph connections?

user friendly that connections are kept in sync. Eliminates a load of bugs at negligible cost of flexibility.

any expectations to have more control in user land?

should have same expectations as when working with existing video nodes
so don't need to let user have loads on control at run time
as with existing nodes, user can create own node if default set don't do the job

how much should videocontext do in the way of audiocontext?

Should allow you to simply integrate you own audiocontext graph with the media element audio
PR does this in a user friendly way (any also leaves door open to completely bring your own graph?)
Existing users should be able to completely opt out (ideally with no performance hit or breaking changes)

use cases

(I could be way off with these suggestions, but humour me, it will be helpful for me to get my head into the problem if you can point out where I'm missing the point)

Always output audio of only "Studio Feed Video Node"

ie you may not always want to follow a transition
need to be able to easily pull out audio from one node and use own processor graph
and disconnect / mute un

Compositing

some cases you want to composite both video and audio (play them all at the same time)
other cases you want to composite video but only use one or none of the audio
escape hatch to use your own audio context graph?

layering connections (the multiple transition node issue)

same issue as above really
multiple video composites where top layer is opaque, so you typically wouldn't want to hear the other layers

PTaylour commented 5 years ago

@gsouquet give me a shout when you want me to have another look at this one :)

germain-gg commented 5 years ago

Yeah definitely! I haven't dedicated much time on time and after a more thorough testing phase I found issues when disconnecting the nodes.

Will get back to you as soon as I have something worth looking at.

Writing good documentation was proven more difficult than I initially thought

Sacharified commented 5 years ago

Some of these tests are failing at the coverage step due to a strange parsing issue within the coverage reporter, IstanbulJS. When running test-unit the tests pass; when running test-coverage the same tests fail.

These are the two commands: "test-unit": "jest test/unit/*.js" "test-coverage": "jest test/unit/*.js --coverage --collectCoverageFrom 'src/**/*.js'"

The only difference is the coverage flags. The error reported is:

 FAIL  test/unit/utils.spec.js
  ● Test suite failed to run

    Jest encountered an unexpected token

    This usually means that you are trying to import a file which Jest cannot parse, e.g. it's not plain JavaScript.

    By default, if Jest sees a Babel config, it will use that to transform your files, ignoring "node_modules".

    Here's what you can do:
     • To have some of your "node_modules" files transformed, you can specify a custom "transformIgnorePatterns" in your config.
     • If you need a custom transformation specify a "transform" option in your config.
     • If you simply want to mock your non-JS modules (e.g. binary assets) you can stub them out with the "moduleNameMapper" config option.

    You'll find more details and examples of these config options in the docs:
    https://jestjs.io/docs/en/configuration.html

    Details:

    /path/to/project/VideoContext/src/ProcessingNodes/transitionnode.js:2620
                                    break (/* istanbul ignore next */_loop2
                                          ^

    SyntaxError: Unexpected token (

      3 | import { SOURCENODESTATE } from "./SourceNodes/sourcenode.js";
      4 | import { VIDEOTYPE } from "./SourceNodes/videonode.js";
    > 5 | import { CANVASTYPE } from "./SourceNodes/canvasnode.js";
        |                                                                                                  ^
      6 | import { IMAGETYPE } from "./SourceNodes/imagenode.js";
      7 | import { DESTINATIONTYPE } from "./DestinationNode/destinationnode.js";
      8 | import { TRANSITIONTYPE } from "./ProcessingNodes/transitionnode.js";

      at ScriptTransformer._transformAndBuildScript (node_modules/jest-runtime/build/script_transformer.js:403:17)
      at src/utils.js:5:274
      at Object.<anonymous> (src/utils.js:13:3)

This isn't a very helpful error but I traced the issue down to this code block:

this.inputs.forEach((input, index) => {
    const value =
        index % 2 === 0
            ? difference * progress - transition.target
            : difference * progress + transition.current;
        input.outputAudioNode.gain.setValueAtTime(
            value,
            this._audioCtx.currentTime
         );
});

By commenting out different stuff I discovered that using any mathematical operator (*, +, / etc.) within this particular forEach loop causes Istanbul to throw an error.

Rewriting this loop as the following causes the tests to pass.

for (let index = 0; index < this.inputs.length; index++) {
    const input = this.inputs[index];
    const value =
        index % 2 === 0
            ? difference * progress - transition.target
            : difference * progress + transition.current;
    input.outputAudioNode.gain.setValueAtTime(
        value,
        this._audioCtx.currentTime
    );
}

I will raise an issue on the NYC repo but would be interested to know if anyone has encountered something like this before?

I'll rewrite it so that the coverage doesn't fail but something to watch out for.

PTaylour commented 5 years ago

hmm interesting. I haven't seen that.

To be honest, I don't think coverage reports give us much value as we're relying more on high level tests anyway.

I'd be happy to remove them to simplify things. What do you think?

bbc / VideoContext