Basic Worklet usage question

Oortone commented 2 years ago

What is the issue about?

[ ] Bug
[ ] Feature request
[x] Usage question
[ ] Documentation
[ ] Contributing / Development

What part(s) of Essentia.js is involved?

[ ] essentia.js-core (vanilla algorithms)
[ ] essentia.js-model (machine learning algorithms)
[ ] essentia.js-plot (plotting utility module)
[x] essentia.js-extractor (typical algorithm combinations utility)

Description

This is a very basic question but I don't understand how to actually use the features extracted using an Audio Worklet as displayed here.

Steps to reproduce / Code snippets / Screenshots

Since an Audio Worklet is supposed to be connected in the audio graph, to recieve and send audio like shown on MDN:

const whiteNoiseNode = new AudioWorkletNode(audioContext, 'white-noise-processor')
whiteNoiseNode.connect(audioContext.destination)

... I dont understand how I would retrieve those values, say a STFT-frame as an array from the process function in the worklet since that is supposed to be a realtime audiostream.

Shown in 4. on the Essentia example page it seems from the comments like it's a callback but I don't understand how to proceed to actually use the values for something else than connecting it to the audio graph. How would I - as an example - make a console.log (on the main.js side) of the RMS values given in output[0][0] = rmsFrame.rms; from the worklet?

//System-invoked process callback function.
  process(inputs, outputs, parameters) {

    // <inputs> and <outputs> will have as many as were specified in the options passed to the AudioWorkletNode constructor, each subsequently spanning potentially multiple channels
    let input = inputs[0];
    let output = outputs[0];

    // convert the input audio frame array from channel 0 to a std::vector<float> type for using it in essentia
    let vectorInput = this.essentia.arrayToVector(input[0]);

    // In this case we compute the Root Mean Square of every input audio frame
    // check https://mtg.github.io/essentia.js/docs/api/Essentia.html#RMS
    let rmsFrame = this.essentia.RMS(vectorInput) // input audio frame

    output[0][0] = rmsFrame.rms;

    return true; // keep the process running
  }

System info

macOS 10.14.6, Chrome 98

jmarcosfer commented 2 years ago

Hi @Oortone

This is more of a Web Audio question than an essentia.js question, but I'll try to help.

It all depends on what you want to retrieve the values for. With the console.log example, you could do it from inside the Worklet. Otherwise, for using the Worklet values elsewhere you have two options:

Attach your Worklet node to an AnalyserNode, which will copy the values received from your Worklet onto a TypedArray via one of its get methods, such as .getFloatTimeDomainData(). This is typically good enough for visualisations, where you can have a requestAnimationFrame loop reading from the AnalyserNode's TypedArray at your screen rate.
Use a SharedArrayBuffer object to access shared memory between the audio thread (Worklet) and the main js thread. This needs a bit more work to set up, and requires you to serve your page with specific HTTP CORS headers. Instead of using SharedArrayBuffer directly, I've used Paul Adenot's ringbuf for these purposes. You can see an example of this here.

There is also a third almost non-option, which is using the AudioWorkletNode's postMessage, but that will likely crash when trying to send data at audio rate (i.e. it's okay for occasionally sending data, like parameters and such).

Hope this helps!

Oortone commented 2 years ago

@jmarcosfer

Thanks a lot, this helps. My idea is to use Essentia in my own Tensorflow js models so I need to retrieve the exact values Essentia is extracting. If I understand you correctly in that case I will need to use the sharedArrayBufferyou describe?

I'm aquainted with AnalyserNode also but fear the (automatic?) windowing in that node will destroy values. But perhaps that's a misconception I'm not sure. Anyway, smoothed and/or windowed values will definitely be completely useless.

And a little bit of argument: I realize the main "code work" is a WebAudio thing but would there be of ony use to create an Essentia Audio Worklet for any other purpose than to get feature values? Otherwise I must argue it's definitely an Essentia question knowing which strategy for retrievement Essentia Audio Worklets is intended for. So the user won't get corrupted values from the extraction process.

Thanks for the links.

jmarcosfer commented 2 years ago

For use with a machine learning model, what we do is run the tensorflow.js model on a dedicated Worker and simply postMessage the features every 1sec or so. This rate will depend on the size of the receptive field of your model, which for some of our models is about 3 seconds, so postMessage works just fine. But of course, if your input size is much smaller and you need more frequent features from the AudioWorklet, then yes, you might have to use SharedArrayBuffer.

I also share your concerns with AnalyserNode, which is why I haven't used it too much. I'll try to do some tests to see how well founded our concerns are. However, I would say that the main issue you might find is in missing audio frames. Web Audio works with 128 sample frames. With AnalyserNode you will likely be getting values with rAF, at about 50hz, which is more or less every 882 samples at 44.1kHz sample rate. So yeah, you'll grab 1 out of every 8 frames from your AudioWorklet, roughly. But if we use getFloatTimeDomainData we shouldn't worry about windowing or smoothing, since that should give us the data as is.

Regarding your argument, you raise a very good point. Most of the use cases of essentia.js in realtime (therefore, typically using AudioWorklets) will require retrieving the calculated features, usually for visualization or feeding to a ML model. I can only think of one use case where you can stay in WebAudio land, and do everything inside the audio graph, which is with analysis-based audio synthesis (i.e. you use WebAudio to write your own synth which might use essentia.js features). But we should definitely think about ways to make this easier/more obvious to the user. Perhaps more clear documentation and examples on this topic. Or maybe providing some abstractions as essentia.js helpers, maybe with built-in SharedArrayBuffer support, and a template for simply instantiating an AudioWorklet where you can choose which essentia.js algos to use and all the I/O buffersize matching and data retrieval boilerplate is given.

Thanks for your comments!

Oortone commented 2 years ago

Great thoughts and ideas. Essentia is very interesting overall, I'm just beginning to get the grip of it and I'm also quite new to any advanced Javascript situations.

Anyway, just so I get this right. Can I post features from the AudioWorklet directly to a Tensorflow Worklet or will I have to make a jump in that case via the main script? I know posting features from the main script to Tensorflow.js works just fine so posting will probably work.

jmarcosfer commented 2 years ago

Yep, you can definitely post from AudioWorklet to your Tensorflow Worker. And yes, you will have to make a jump via the main script, but only to transfer a MessagePort object from the Worker to the AudioWorklet to establish communications. Once that's done, you can go straight from AudioWorklet to Tensorflow Worker. Like this, taken from our realtime audio autotagging demo:

// in your Tensorflow worker: create channel, keep one port and send the other to main thread
const channel = new MessageChannel();
const port1 = channel.port1;

postMessage({
    port: channel.port2
}, [channel.port2]);

// in main.js
// this is what would be your tensorflow Worker:
inferenceWorker = new Worker('./src/inference-worker.js');
inferenceWorker.onmessage = function listenToWorker(msg) {
    if (msg.data.port) {
        // listen out for port transfer
        workerToWorkletPort = msg.data.port;
    }
// ...

// then when you create your AudioWorkletProcessor, before connecting the audio graph
featureExtractorNode = await createAudioProcessor(audioCtx);
featureExtractorNode.port.postMessage({
    port: workerToWorkletPort
}, [workerToWorkletPort]);

// in AudioWorkletProcessor constructor
this.workerPort = undefined;
this.workerPortAvailable = false;
this.port.onmessage = (msg) => {
    if (msg.data.port) {
        this.workerPort = msg.data.port;
        this.workerPortAvailable = true;
    }
}

Oortone commented 2 years ago

Great news. I'll look into this. This is very helpful.

MTG / essentia.js