Configurable AudioWorklet process block size (higher than 128 samples)?

josh83abc commented 6 years ago

Hello everybody!

As far as I see, the AudioWorkletProcessor process block size is 128 samples, like any AudioNodes.

I haven't really tested the robustness of the audio stream but this value seems to be pretty low to me if you compare it with what you see in desktop music softwares (process block size is usually more than 512 samples and can easily be 1024 or 2048 samples).

I don't know if it is related but I can hear frequent tiny audio glitches with the AudioWorklet sinus demo when I switch from tab to tab. https://googlechromelabs.github.io/web-audio-samples/audio-worklet/basic/hello-audio-worklet.html

Also in my app, the audio latency is not the most important aspect, so I prefer to have a higher latency allowing a better robust audio stream.

Would it be possible to change the AudioWorkletProcessor process block size in the future? Is there a workaround?

PS : this post also talks about it : https://github.com/WebAudio/web-audio-api/issues/1466

hoch commented 6 years ago

The internal buffering (e.g. FIFO) in the processor might be the way to resolve the buffer size difference. Even if we change the spec to accommodate the variable buffer size, the implementation will do the buffering internally anyway because the other parts of WebAudio use 128 frames.

Having the same render quantum size is the key to the lower latency, and it was the one of design goals of AudioWorklet. I doubt that WG will change something fundamental like this now, but it can be revisited later for the V2.

hoch commented 6 years ago

Also the glitch is an implementation issue. If you have a repro case, please file a bug and cc me (hongchan@).

sletz commented 6 years ago

@josh83abc: you can get more info on glitch issues on this bug report: https://bugs.chromium.org/p/chromium/issues/detail?id=796330&can=2&start=0&num=100&q=component%3ABlink%3EWebAudio&colspec=ID%20Pri%20M%20Stars%20ReleaseBlock%20Component%20Status%20Owner%20Summary%20OS%20Modified&groupby=&sort=

josh83abc commented 6 years ago

Thanks @sletz for this link!! lot of very interesting stuff in it. Avoid audio glitches is the main priority of my music player, so I'm very interested in understanding all the details.

@hoch, thanks a lot for all these technical details. I'm aware that all the webaudio graph is processed with a render quantum of 128 samples and so does the AudioWorkletProcessor. For now I didn't really tested the audio robustness of the webaudio graph rendering, it is pretty good for sure but I will try to reach its limits. Also I will test on iOS and Android. Tell me if I'm wrong but I don't think a FIFO on top of the AudioWorkletProcessor will improve the robustness that much.

For the V.next, it would be interesting if the webaudio render quantum size could be ajusted by the programmer who doesn't need a very low latency. For instance, a render quantum of 1024 samples (23ms@44.1khz) would be way enough for my application. Actually Flash is using this value to process audio, the latency is ok and there is no audio glitch at all, even if Chrome crashes.

padenot commented 6 years ago

The internal block size of the Web Audio rendering graph provides an lower bound for the OS buffer size, not an upper bound (and even then, it's not strictly true).

The Web Audio API already has a mechanism to request a higher latency on an AudioContext, using AudioContextLatencyCategory.

Please open issues on the UA's bug tracker for issues about implementations.

josh83abc commented 6 years ago

Thanks a lot @padenot for letting me aware of the latencyHint option of the AudioContext!! I totally missed it when I read the spec, but this exactly what I was talking about : having an option to ajust the audio latency according the need of the app. Amazing it already exists!! webaudio is dope :)

rtoy commented 6 years ago

Based on https://github.com/WebAudio/web-audio-api/issues/1503#issuecomment-368667127, I think we can close out this issue. I don't think there's anything that needs to be done for the spec.

fr0m commented 6 years ago

Back then using ScriptProcessorOrNode, buffer size controls how frequently the audioprocess event is dispatched, well latencyHint doesn't affect that.

Since the AudioWorkletProcessor process buffer size is 128, the frequency of process is triggered is overwhelmingly higher than using ScriptProcessorOrNode with 1024 buffer size or higher.

This situation may cause high CPU occupation since expensive operation may be executed in the process, which is the situation in my project. So do we have another way to control the frequency of process triggered?

padenot commented 6 years ago

This situation may cause high CPU occupation since expensive operation may be executed in the process, which is the situation in my project. So do we have another way to control the frequency of process triggered?

No. If it's too expensive when computing 128 frames, why is it OK when computing 1024 frames? You exactly have the same amount of time per frame to compute the audio.

You probably simply need to optimize your code.

sletz commented 6 years ago

Because running the complete audio chain adds a fixed processing cost per buffer. When smaller buffers are used, the fixed cost is added more times, so finally takes more of the available CPU.

fr0m commented 6 years ago

Thanks for the reply.

We use audioWorklet to do audio live stream, encoded audioBuffer is sent via WebSocket in AudioWorkletProcessor. It's expensive because of the frequency, not frame.

I can cache the buffer, and send the buffer in a particular buffer size. But it will be much better if the unnecessary process will not be triggered at all. Will you take this situation into account?

positonic commented 5 years ago

@sletz that benchmark link for glitches is dead, has it moved somewhere?

sletz commented 5 years ago

Which link ?

thedracle commented 1 year ago

Was this closed with a resolution of some kind?

I have a situation where I have a trained model that works on buffers that are at a minimum 512 frames, or multiples of it.

The processing time isn't the issue, but the network was trained/optimized for this frame size, and doesn't seem to be an easy to way to match it with the demand that all processing be done on 128 sample chunks.

Is it acceptable to set the latencyHint to "playback" and to block the callback until I've collected 512 frames, processed them, and then feed my collected frames out one at a time?

padenot commented 1 year ago

Was this closed with a resolution of some kind?

This was in fact about latency, and was closed because of this: https://github.com/WebAudio/web-audio-api/issues/1503#issuecomment-368686686. We didn't update the issue's title, maybe we should have.

I have a situation where I have a trained model that works on buffers that are at a minimum 512 frames, or multiples of it.

The processing time isn't the issue, but the network was trained/optimized for this frame size, and doesn't seem to be an easy to way to match it with the demand that all processing be done on 128 sample chunks.

Is it acceptable to set the latencyHint to "playback" and to block the callback until I've collected 512 frames, processed them, and then feed my collected frames out one at a time?

There are three things you can do. Two you can do now, and one you'll be able to do later next year. It's not a huge effort to do the first two, but the best solution depends on the situation:

First, you can internally buffer 384 frames: you accumulate 384 frames of input, in the first 3 callbacks, and then on the 4th callback, when you've had 512 frames of input, you can run your model. This induces 3 blocks of latency, maybe this is acceptable for your use-case. This works well if your model runs in less than 128 / sampleRate seconds. You can measure this easily, https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/ has instructions.
Otherwise, if your model takes more than 128 / sampleRate seconds to execute, you can offload the computation to a Web Worker, by sending from the worklet to a Web Worker using a ring buffer. This is explained in https://blog.paul.cx/post/a-wait-free-spsc-ringbuffer-for-the-web/, the repo has two examples: efficiently sending audio to a Web Worker, and back to an AudioWorkletProcessor. If you don't need to play the audio out, then this is the solution I'd recommend (you can skip sending the audio back to the AudioWorkletProcessor in this case.
Finally, sometime next year, we'll merge and implement a feature to change the block size of an AudioContext, in which case you'll be able to specify 512, and you'll get buffers with 512 frames in your AudioWorkletProcessor. The specification part is done, but it's a big change in the implementations, so we've delayed merging the specification text for clarity.

In any case, you can never "block the callback", this will cause problems, such as demoting the real-time audio thread from real-time priority to regular priority, causing all sorts of glitches and problems. But if you're not playing the audio out anyways, you can set the latencyHint to "playback", and it might save some power (depending on the OS and implementation).

thedracle commented 1 year ago

@padenot Awesome, thanks for the very detailed response! It's extremely helpful.

I'm pleased to see the final resolution, and the latency introduction is unfortunately unavoidable given the constraints of the ML model we are using, and we are willing to accept it.

I will try the first suggestion for now, and I look forward to the new feature to be able to change the block size.

WebAudio / web-audio-api

Configurable AudioWorklet process block size (higher than 128 samples)? #1503