[audioworklet] Bring Your Own Buffer style of memory management

joeberkovitz commented 6 years ago

Discussing with WebAssembly group, we've been talking about having an alternate approach to managing the buffers used to communicate data between an AudioWorkletProcessor and the rendering thread environment (input, output and parameters buffers). It would be better for WASM in many ways if we allowed the AWP code to respond to the needs of the environment by providing its own buffers, rather than having to work with JS buffers or views provided from outside. This would eliminate the number of copies and make for a very direct WebAssembly/Web Audio connection.

hoch commented 4 years ago

The conversation from TPAC: We need to come up with the API design. For example, AudioWorkletProcessor's constructor can expose methods like registerInputBuffer() and registerOutputBuffer() so the user code pass the piece of WASM heap memory.

svgeesus commented 4 years ago

rtoy commented 4 years ago

Teleconf: Probably a duplicate of WebAudio/web-audio-api#2427, which has more detail on AudioWorklet and WASM.

hoch commented 3 years ago

The outcome from TPAC 2020:

This is not an exact duplicate of WebAudio/web-audio-api#2427. This can be applied to non-WASM use cases as well. With both improvements, the performance of Audio Worklet can be certainly improved.

What's being proposed is to call something like registerBuffer(wasmHeap) or registerBuffer(typedArray) at the node/processor registration procedure.

Also a callback function like ontopologychange is necessary, so the user code can reallocate memory when the incoming AudioNode configuration (e.g. channel count) is changed.

padenot commented 3 years ago

Discussed during the AudioWG online F2F, some pseudo-code that show how this could work (just a proposal, nothing concrete, drafted while talking during the call):

partial interface AudioWorkletProcessor {
    enum BufferType {
      INPUT = 1,
      OUTPUT = 2
    };
    // indicates that this memory will be used by the AudioWorklet -- don't touch it yourself
    // variant: register just the output, just the input.
    // This can be called in the ctor, or while process is running to provide the output from something already computed
    registerBuffers(ArrayBuffer memory, uint64_t offset, uint64_t maxLength, type = INPUT|OUTPUT);

    // Called when the input or output channel count changes for whatever reason
    // you can accept or deny the request. In case of error, `process` isn't called anymore
    channelTopologyChange = function(ArrayBuffer memory, uint64_t offset, uint64_t newMaxLength) {
        // return false; // reject the change request: audio will be down/up mixed to the previous topology

       // find some memory in already allocated space/allocate new memory

        return {
            memory: memory,
            offset: new_offset,
            maxLength: new_max_length
            type: INPUT
         };
    }
}

then the memory backing the arrays in process would be the memory registered above.

Being able to call registerBuffers from process will be very nice for be for developers using this pattern described by @hoch: buffers of the right size are prepared in advance (with a safety margin), and the worklet just uses one of them as output. The buffer could then be safely recycled in the next process call.

mdjp commented 3 years ago

This would be a great issue to discuss across groups as TPAC 2021

juj commented 2 years ago

I did recently write an implementation that enables multithreaded Wasm code to define AudioWorkletProcessors and instantiate AudioWorkletNodes from Wasm. The branch can be found here:

https://github.com/juj/emscripten/compare/wasm_workers...juj:audio_worklets?expand=1

The intent is to bring the support in to core Emscripten in the near future.

The benefit of that branch with respect to this github issue is probably to concretely show what the JS<->Wasm marshalling copying overhead currently looks like, see the process function in src/library_webaudio.js.

In the implementation explicit care was taken to make the marshalling as "tight" as possible with minimized JS overhead (use Wasm thread stack space to avoid mallocs, minimize excess computation or abstractions in JS). So far we don't unfortunately have any data to know how much that marshalling performance overhead is in general.

For a small tone generator C program example of how one would use that API, see the end of the PR, tests/webaudio/tone_generator.c.

Jonathhhan commented 2 years ago

I use Open Frameworks with Emscripten, and try to replace scriptprocessornode with audioworklet. With sharedarraybuffer I can access the wasm memory from the main thread in the audioworklet. My problem is, that I cannot import a function (the audio callback) from the wasm Module in the main thread (it also needs to be in the main thread because it generates graphics too).

This is how I import the memory:

var outbufferArray = Module.HEAPF32.subarray(outbuffer >> 2, (outbuffer >> 2) + bufferSize * outputChannels); AUDIO.contexts[context_id].audioWorklet.addModule("bypass-processor.js").then(() => { const bypasser = new AudioWorkletNode(AUDIO.contexts[context_id], "bypass-processor", [1, 1, 2]); var worker = new Worker("wasm_worker.js"); bypasser.port.postMessage(outbufferArray) bypasser.port.onmessage = (e) => { dynCall("viiii", callback, [bufferSize, inputChannels, outputChannels, userData]); } bypasser.connect(AUDIO.contexts[context_id].destination) } );

This is the Audioworklet:

` var counter = 0 class BypassProcessor extends AudioWorkletProcessor { constructor() { super() this.port.onmessage = (e) => { console.log(e.data, sampleRate) this.data = e.data; } }

process(inputs, outputs) {

    counter = currentFrame / 128 % 64;
    if (counter == 0) {
        this.port.postMessage('pong')
    }
    const input = inputs[0];
    const output = outputs[0];
    for (let channel = 0; channel < 1; ++channel) {
        const outputChannel = output[channel];
            outputChannel.set(this.data.slice(counter * 128, counter * 128 + 128));
    }
    return true
}

} registerProcessor('bypass-processor', BypassProcessor); `

I think, my problem would be solved, if I could call dynCall("viiii", callback, [bufferSize, inputChannels, outputChannels, userData]); from the audioworklet. Calling the callback with port.postMessage is less reliable than audioscriptprocessor (but the sharedarraybuffer works well).

juj commented 2 years ago

@Jonathhhan check out the audio_worklets branch in my Emscripten Github repo.

In particular, the commit https://github.com/juj/emscripten/commit/4aa53f43af938d5d55f4ce54b0612387cbcffd21 . That achieves such a shared setup where the main wasm application can run on the main browser thread, and create wasm-based audioworklet nodes that call to a custom wasm function entry point for audio processing.

Jonathhhan commented 2 years ago

@juj thank you. I will try that soon. Yesterday I had success with the solution from @tklajnscek https://github.com/emscripten-core/emscripten/pull/12502 would be really great, if audioworklets will become a standard feature of Emscripten.

hoch commented 2 years ago

@juj Yes, that would be a huge improvement. Is a PR being reviewed somewhere? Is there something Audio WG can help/support?

Jonathhhan commented 2 years ago

@juj I tried your tone_generator.c example like this: emcc -s MINIMAL_RUNTIME=1 -s AUDIO_WORKLET=1 -s WASM_WORKERS=1 -s WEBAUDIO_DEBUG=1 -s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$emscriptenGetAudioObject -o tone_generator.html tone_generator.c It creates the audio worklet and compiles, but if I run it I get this error message:

Uncaught ReferenceError: emscriptenGetAudioObject is not defined
    at 1056 (343b50ff-7dba-4c7e-9fa8-85dade93c8ff:490:46)
    at _emscripten_asm_const_int (343b50ff-7dba-4c7e-9fa8-85dade93c8ff:584:31)
    at ba64fc02:0x639
    at MessagePort._EmAudioDispatchProcessorCallback (343b50ff-7dba-4c7e-9fa8-85dade93c8ff:664:30)
1056 @ 343b50ff-7dba-4c7e-9fa8-85dade93c8ff:490
_emscripten_asm_const_int @ 343b50ff-7dba-4c7e-9fa8-85dade93c8ff:584
$func9 @ ba64fc02:0x639
_EmAudioDispatchProcessorCallback @ 343b50ff-7dba-4c7e-9fa8-85dade93c8ff:664

Did I make mistake with the flags?

juj commented 2 years ago

No, that does look good. Tried it now and I don't get any issue. I am on commit

commit 4aa53f43af938d5d55f4ce54b0612387cbcffd21 (HEAD -> audio_worklets, juj/audio_worklets)
Author: Jukka Jylänki <jujjyl@gmail.com>
Date:   Sun Oct 24 18:47:25 2021 +0300

   Add first implementation of Wasm Audio Worklets, based on Wasm Workers.

and build with

E:\code\emsdk\emscripten\main>emcc -s MINIMAL_RUNTIME=1 -s AUDIO_WORKLET=1 -s WASM_WORKERS=1 -s WEBAUDIO_DEBUG=1 -s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$emscriptenGetAudioObject -o tone_generator.html tests\webaudio\tone_generator.c

which generates

11/30/2021  02:00 PM             6,732 tone_generator.aw.js
11/30/2021  02:00 PM             1,393 tone_generator.html
11/30/2021  02:00 PM            77,520 tone_generator.js
11/30/2021  02:00 PM             3,182 tone_generator.wasm
11/30/2021  02:00 PM               756 tone_generator.ww.js

Then running on an ad hoc web server on localhost I do get the tone generator running.

The error you print seems as if the command line directive -s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$emscriptenGetAudioObject did not work for some reason. The generated file tone_generator.js should have the following in it:

  function emscriptenGetAudioObject(objectHandle) {
      return EmAudio[objectHandle];
    }

Zipped the build here, which does work for me on Windows in Chrome and Firefox tone_generator.zip

Also pushed a commit to the branch that improves some assertions and error checks for e.g. the scenario when the browser does not support AudioWorklets at all.

juj commented 2 years ago

@juj Yes, that would be a huge improvement. Is a PR being reviewed somewhere? Is there something Audio WG can help/support?

There is no PR yet, I am working on the prerequisite wasm workers PR first to build the necessary scaffolding to get audio worklets in the tree.

Jonathhhan commented 2 years ago

@juj thanks. It works, if I paste:

  function emscriptenGetAudioObject(objectHandle) {
      return EmAudio[objectHandle];
    }

manually.

juj commented 2 years ago

oh wait, this is probably a Windows vs macOS/Linux build issue: you must be on a Unix-like OS with a bash-like shell?

There

-s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$emscriptenGetAudioObject

will cause $emscriptenGetAudioObject to look up a variable, so it gets expanded to an empty string.

If you build by passing either

-s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=\$emscriptenGetAudioObject

or

-s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=$$emscriptenGetAudioObject

does that work? (not sure which of those escapes a $ character on cmdline, quick google search didn't find a hit either)

Jonathhhan commented 2 years ago

@juj thanks a lot. -s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE=\$emscriptenGetAudioObject actually works. Also already tried to implement it into Open Frameworks, but then I realized it is not possible to use WASM_WORKERS and USE_PTHREADS at the same time yet. And I need USE_PTHREADS for sharedArrayBuffer as far as I know to share the memory with the AudioWorklet? Edit: Now I think I understand better. WASM_WORKERS are an optional replacement for USE_PTHREADS and they are not implemented in the master branch yet...

meditating-monkey commented 2 years ago

Any updates on this registerBuffer() method?

guest271314 commented 2 years ago

Note, on Chromium 101 when

await WebAssembly.instantiate(buffer, {
    js: { mem: memory },
  })

is called before

registerProcessor('audio-worklet-stream', class AudioWorkletStream extends AudioWorkletProcessor {
// ..
});

an uncaught error is thrown and AudioWorkletNode is not created

Uncaught (in promise) DOMException: Failed to construct 'AudioWorkletNode': AudioWorkletNode cannot be created: The node name 'audio-worklet-stream' is not defined in AudioWorkletGlobalScope.

That does not occur when await is not used before WebAssembky.instantiate().

I read https://webaudio.github.io/web-audio-api/#AudioWorkletNode, https://webaudio.github.io/web-audio-api/#node-name-to-parameter-descriptor-map, and https://webaudio.github.io/web-audio-api/#node-name-to-processor-constructor-map, yet have not located relevant specification language for this case.

hoch commented 11 months ago

2023 TPAC Audio WG Discussion: padenot@ will write a document that explains the proposal shown in this comment. The WG will contact developers who made significant contributions to the similar project (Emscripten) to gather feedback.

juj commented 11 months ago

I now only finally paused to think about @padenot 's code snippet above (https://github.com/WebAudio/web-audio-api/issues/2442#issuecomment-842512229). Not sure if the thinking has evolved since it has been a long time, but I was puzzled over the channelTopologyChange element there.

My understanding is that channelTopologyChange would be an event that would be raised when the memory requirements of the inputs and outputs would change so that the previous memory area passed to registerBuffers would no longer work?

And the handler of that channelTopologyChange callback would then have the responsibility of providing a new subarray location in the ArrayBuffer where the interaction would take place?

It looks like the proposed handler there returns the new subarray location. This seems to be duplicating the behavior that a call to registerBuffers also achieves, so I wonder if it would make sense to spec the handler of channelTopologyChange to need to call registerBuffers again in order to configure the new subarray region? That way the code flow would be uniform and there wouldn't exist two different looking mechanisms in the API to specify the subarray information.

I.e. something like

    enum BufferType {
      INPUT = 1,
      OUTPUT = 2
    };
    // indicates that this memory will be used by the AudioWorklet -- don't touch it yourself
    // variant: register just the output, just the input.
    // This can be called in the ctor, or while process is running to provide the output from something already computed
    registerBuffers(ArrayBuffer memory, uint64_t offset, uint64_t maxLength, type = INPUT|OUTPUT);

    // Called when the input or output channel count changes for whatever reason
    // you can accept or deny the request. In case of error, `process` isn't called anymore
    channelTopologyChange = function(ArrayBuffer memory, uint64_t offset, uint64_t newMaxLength) {
       // find some memory in already allocated space/allocate new memory, and update previous registration
       registerBuffers(memory, new_offset, new_max_length, INPUT);
    }
}

This way the code would look similar to first allocation, and subsequent reallocations.

One note is that there might be a big challenge to users with respect to multithreading safe memory allocation inside channelTopologyChange handler. In Emscripten that is no problem, we have well crafted multithreading safe allocators that are able to find memory allocations while other code is running. But I think it might make sense to explicitly write the spec with a wording that does not require the user to have such a capability.

That could be achieved by making sure that registerBuffers will allow "over-allocation", i.e. that the user could pass buffer sizes there that are well bigger than what the node currently needs for input&output. Then if user knows the maximum size needed, they can statically allocate that at startup, and pass it to registerBuffers, and not need to worry about resizes down the line. I am not sure if the above proposal already had that idea in mind (what should/would happen if maxLength was passed to be larger than what is used by the node), though that is something that will be a useful guarantee to have.

hoch commented 11 months ago

@juj Thanks so much for detailed comment! We'll review and respond soon.

padenot commented 11 months ago

@juj, I agree with everything you said in your comment: the API proposal and the caveat about this potentially requiring a multi-thread aware allocator (if the heap is shared between multiple threads, that is).

I had over-allocation in mind when proposing the above, because this is typically how this can be/is solved in native. Sometimes you allocate a bit more upfront, or also in other cases when it goes from stereo to mono you don't free the second buffer.

WebAudio / web-audio-api

[audioworklet] Bring Your Own Buffer style of memory management #2442