WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 165 forks source link

Allow user-selectable render quantum size #2450

Open rtoy opened 4 years ago

rtoy commented 4 years ago

Describe the feature Allow an AudioContext and OfflineAudioContext to have a user-selectable render quantum size instead of the current fixed size of 128.

This allows better integration with AudioDeviceClient which can render in many different sizes.

Is there a prototype? No. Can't prototype this.

Describe the feature in more detail Basically add a new dictionary member to specify a render size for the constructors. The default is 128, of course.

rtoy commented 4 years ago

Updated AudioContextOptions:

dictionary AudioContextOptions {
  (AudioContextLatencyCategory or double) latencyHint = "interactive";
  float sampleRate;
  unsigned long renderSizeHint = 128;
};

It's a hint to allow the browser to choose something else appropriate to the browser such as a power of two.

Updated BaseAudioContext:

interface BaseAudioContext : ... {
...
  readonly attribute unsigned long renderQuantumSize;
}

renderQuantumSize is the actual size. And named this because the spec uses render quantum everywhere, and calls it render quantum size

rtoy commented 4 years ago

What do do about the 3-arg OfflineAudioContext constructor? Should we allow a fourth arg to specify the render size? Or just say that if you want the new goodness you need to use the constructor with dictionary?

rtoy commented 4 years ago

Closely related to WebAudio/web-audio-cg#9

rtoy commented 4 years ago

Another thing to consider with this issue is what happens with ScriptProcessorNode and its buffer size that is required to be a power of two. Yes, that node is deprecated, but not yet removed so we need to think about this a least.

jas-ableton commented 4 years ago

@rtoy From the uninformed perspective of a front-end developer, I'd be more than happy to just use the dictionary constructor.

Also marking this as related to: https://bugs.chromium.org/p/chromium/issues/detail?id=924426, where it seems we've identified that awkward buffer sizes lead to significant performance hits.

meshula commented 4 years ago

The AudioBus/Channel classes in the implementation don't really like changing from frame to frame; the chromium issue describes buffers that are either 128 or 192 bytes; the glitching is probably changes in buffer sizes being propagated through the graph, possibly a frame late. I think an easier and more reasonable fix is to keep the 128 quantum size, but run the streams through a ring buffer so that the engine is always fed 128 byte chunks to process, and possibly running a chunk ahead of the callbacks. I worked through supporting a variable quantum in LabSound (which was originally a fork of the webkit webaudio sources), and it seems like way more trouble than its worth, especially in the face of alternative simple solutions like a ring buffer. (e.g. https://github.com/dr-soft/miniaudio/blob/master/examples/fixed_size_callback.c) By way more trouble, I mean that most of the convolution filters, such as delays, HRTF, and reverb, all have a dependency on working on power of two chunks, and mismatching those versus the render callback size is bothersome, and without careful thought can introduce new pops and latency. Although I did the work on the variable quantum I am going to abandon that branch...

rtoy commented 4 years ago

I believe all implementations do some kind of FIFO/ring buffer to manage the difference between WebAudio's 128-frame chunks and the underlying HW block sizes.

I have made some changes to Chrome to support this, and even in the simple cases if the block size is changed to some other value, many of the current WPT tests fail because the generated numbers are different from the expected. I do not if that's because I messed up or because that's how things work, or because extra (or less?) round-off happens.

And, as you say, getting anything that uses FFTs as the underlying implementation is a ton of work, and impacts performance. For small sizes, this probably hurts performance quite a bit. For large sizes, this probably helps because we can use larger FFT sizes.

In all, this is a ton of work to get things all working and performing well.

padenot commented 4 years ago

What @meshula describes is what is being done today in implementations. It's however very inefficient in cases where the system buffer size is not a power of two (very common on anything but macOS, if we're talking about consumer setups), so we're changing it. There is no other way to fix this properly: the fundamental issue is that the load per callback cannot be stable if the rendering quantum size is not a divisor of the system callback size, and this means that theoretical maximum load is reduced.

Let's consider a practical example (my phone), that has a buffer size of 192 frames. With a rendering quantum of 128 frames, a native sample-rate of 48kHz, and a system buffer size of 192 frames, the rendering with a ring buffer to adapt the buffer sizes go like this:

iteration # number of frames to render number of buffers to render leftover frames
0 192 2 64
1 192 1 0
2 192 2 64
3 192 1 0
4 192 2 64
5 192 1 0

Because this is real-time, we have 192 * 1000 / 48000 = 4ms for each callback, but sometimes we need to render 256 frames in 4ms, and sometimes 128 frames in 4ms. The total maximum load is therefore 50% of what the phone can do.

The render quantum size is not going to be variable. It's going to be user-selectable at construction, and will be based on the characteristics of the underlying audio stack.

This means we'll have to fix the FFT code indeed, the simple fix being to introduce the ring buffer only there.

jas-ableton commented 4 years ago

Thanks for laying this out so clearly and succinctly @padenot!

The point about various nodes needing to operate on certain buffer sizes is a good one, but as @padenot points out, the common practice (in audio plugin development, for example) is to satisfy these constraints within the implementation of the node itself. If a node/plugin needs to operate on power of two buffers because it does FFTs, it's usually the case that the node/plugin is responsible for doing the necessary buffering internally. Similarly, if a node/plugin needs to do its processing at a certain sample rate, the node/plugin will do the required resampling internally, as opposed to forcing all other nodes in the graph to run at the same sample rate.

meshula commented 4 years ago

The part that made my brain hurt, in the WebKit implementation, was the optimization whereby a bus's channels' buffers can be passed up the chain in the case where a node wouldn't actually affect the buffer. Of course the right answer is that the buffer can't be shared when it hits a node that has a different internal need, and that life is much easier if the render quantum is fixed at initialization, rather than runtime-variable.

padenot commented 4 years ago

This issue is not about dynamically changing the buffer size during an AudioContext life time. It's about deciding on a buffer size at initialization, based on what the system can do.

The buffer sharing technique that you see in WebKit is also present in Gecko, and it's just an optimization.

padenot commented 4 years ago

Virtual F2F:

padenot commented 4 years ago

Exact shape of the API is TBD. There will have to be provisions to haveScriptProcessorNodestill work. Having a FIFO before it and still send buffers with power of two sizes seem like the way to go.

rtoy commented 3 years ago

From today's teleconf:

We want to allow selecting the desired size, but we also need a way to specify that we want to use whatever the HW-recommended size is.

We probably don't want to allow any possible size, but not sure about the constraints except that requiring a power of two isn't going to work for many Android devices. Perhaps small multiples of powers of two? Maybe something like 2^p*3^q*5^r?

padenot commented 3 years ago

I don't think we can have a strict formula here. Windows/WASAPI works in 10ms chunks at the stream's rate, that means that 441 is a very common buffer size.

rtoy commented 3 years ago

Strangely, when WASAPI support was added in Chrome, I remember the 10ms chunks, but the actual size was 440. Didn't quite understand how that worked, but maybe my memory is wrong.

rtoy commented 3 years ago

Teleconf: Do something like latencyHint: an integer or an enum for default, and HW size. Leave it up to the browser to choose the requested size or round to something close. And AudioWorklet can tell you what size was acutally used.

@padenot notes that some systems (pulseaudio) don't have constant buffer sizes; they can change over time.

padenot commented 3 years ago

This ties into the output device change (either explicitly or implicitly, because for example the AudioContext was running on a device, that is now unplugged), that frequently has different buffer sizes (and samplerate).

Having an event that is fired on an AudioContext, when the underlying device has changed would be very useful. Authors could decide to just ignore this event (and it would continue working, like it does now), but they could also decide to do another AudioContext, running at a different rate with a different sample-rate (and rebuilding their graph, keeping in mind that all the AudioBuffer are shareable between AudioContext), to ensure that the DSP load can be as high as possible on the new device.

padenot commented 3 years ago

This is ready for a first cut. The things needed are:

rtoy commented 3 years ago

TPAC 2020:

Basic proposal: The render size is a hint. The default is 128. There would be an enum to request the optimum HW size. Browsers are only required to support powers of two, but highly encouraged to support more. There will be an attribute on the AudioContext to let developers know what the actual size is.

This will be an additional member of AudioContextOptions.

rtoy commented 3 years ago

Rough proposal, actual names TBD

enum AudioContextRenderSizeCategory {
  "default",
  "hardware"
};
dictionary AudioContextOptionsAdditions : AudioContextOptions {
  (AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default";
};
partial interface AudioContext {
  unsigned long renderSize;
};

May not want this derived dictionary. The minimum and maximum supported render size is up to the browser. Perhaps we can say values from 128 to 4096(?) are required to be supported? Lower and higher values are allowed.

The actual value used is up to the browser, except powers of two are required to be supported and honored provided they don't exceed the minimum and maximum allowed sizes for the browser. We don't specify how this value is chosen. It is highly recommended that browsers also support other sizes that are common for the OS.

The renderSize attribute is the actual size chosen by the browser.

Some additional implementation notes, not relevant to the spec but important for implementors. Most of the complication comes from supporting the convolver and other FFTs when the render size is not a power of two.

rtoy commented 3 years ago

For an OfflineAudioContext, let's update the options dictionary to:

dictionary OfflineAudioContextOptions {
  unsigned long numberOfChannels = 1;
  required unsigned long length;
  required float sampleRate;
  (AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default"
};

This also means we want the BaseAudioContext to have a renderSize attribute:

partial interface BaseAudioContext {
  unsigned long renderSize;
}

This replaces the proposal in https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-709614649 that added this to the AudioContext.

rtoy commented 3 years ago

Creating a ScriptProcessorNode has a bufferSize argument. Currently, the allowed values are 0, 256, 512, 1024, 2048, 4096, 8192, 16384. That is, the sizes are 128*2, 128*2^2, 128*2^3, 128*2^4,...,128*2^7.

For user-selectable sizes, I propose we change the allowed values to be 0 and s*2^n where s is the renderSize, and n = 1 to 7.

This preserves the current constraints on the sizes when the renderSize is 128, and extends it to other renderSizes in a way that no unusual buffering is required. This simplifies the changes needed to the deprecated ScriptProcessorNode.

rtoy commented 3 years ago

From the teleconf, the API should be updated:

enum AudioContextRenderSizeCategory {
  "default",
  "hardware"
};

partial interface BaseAudioContext {
  readonly attribute unsigned long renderSize;
};

The dictionaries for the AudioContext and OfflineAudioContext are updated as follows by adding renderSizeHint:

dictionary AudioContextOptions {
  (AudioContextLatencyCategory or double) latencyHint = "interactive";
  float sampleRate;
  (AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default";
};

dictionary OfflineAudioContextOptions {
  unsigned long numberOfChannels = 1;
  required unsigned long length;  
  required float sampleRate;
  (AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default";
};
rtoy commented 3 years ago

Some additional notes.

"default" means 128. "hardware" means the appropriate size for the hardware. Browsers are free to a different value.

renderSizeHint can be a numerical value in which case it is the number of frames with which to render the graph. Browsers are only required to support render sizes that are powers of two from 64 to 2048. (I think @jas-ableton wanted 2048). It is recommended that browsers also support other sizes, but this is not required.

Finally, as mentioned in https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-776805580, ScriptProcessorNode bufferSize is now 0 or s*2^n for n = 1 to 7, where s is value of the renderSize attribute.

The interaction between renderSizeHint and latencyHint still needs to be worked out, but in general, this is pretty much up to the browser.

rtoy commented 3 years ago

See proposed explainer at https://github.com/rtoy/web-audio-api/blob/13-user-render-size-explainer/explainer/user-selectable-render-size.md.

Comments welcome.

svgeesus commented 3 years ago

Overall the explainer looks great. A few minor suggestions:

This was probably a trade-off

"probably" is over cautious, drop

This does increase latency a bit, but since Android is already using a size of 192, there is no actual additional latency.

Reword. This would increase latency a bit compared to a native size of 128,, but since Android is already using a size of 192, there is no actual additional latency in practice.

For example, selecting "hardware" may result in a size of 192 frames for the Android phone used in the example.

That seems mysterious. Does it mean it might still pick 128? Or that it picks 256, the next largest power of two? Or what?

If the requested value is not supported by the UA, the UA MUST round the value up to the next smallest value that is supported. If this exceeds the maximum supported value, it is clamped to the max.

Aha ok it might pick 256; say so above.

The the problem isn't limited to Android.

The problem isn't limited to Android.

In particular, UAs that don't double buffer WebAudio's output, the latencyHint value can be 0, independent of the renderSize

Maybe explicitly say that for UAs that do double buffer, the latency will increase.

rtoy commented 3 years ago

On Thu, May 27, 2021 at 12:25 PM Chris Lilley @.***> wrote:

Overall the explainer looks great. A few minor suggestions:

This was probably a trade-off

"probably" is over cautious, drop

This does increase latency a bit, but since Android is already using a size of 192, there is no actual additional latency.

Reword. This would increase latency a bit compared to a native size of 128,, but since Android is already using a size of 192, there is no actual additional latency in practice.

Updated as suggested.

For example, selecting "hardware" may result in a size of 192 frames for the Android phone used in the example.

That seems mysterious. Does it mean it might still pick 128? Or that it picks 256, the next largest power of two? Or what?

I'll rephrase this or link to the section on supported sizes.

If the requested value is not supported by the UA, the UA MUST round the value up to the next smallest value that is supported. If this exceeds the maximum supported value, it is clamped to the max.

Aha ok it might pick 256; say so above.

The the problem isn't limited to Android.

The problem isn't limited to Android.

Fixed.

In particular, UAs that don't double buffer WebAudio's output, the latencyHint value can be 0, independent of the renderSize

Maybe explicitly say that for UAs that do double buffer, the latency will increase.

I think that's in the next paragraph.

Thanks for the review.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-849882147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFB6PA6D2NIEV5453K2EDLTP2MDRANCNFSM4IW6VESA .

-- Ray

JohnWeisz commented 2 years ago

To expand on this comment from @padenot

This is ready for a first cut. The things needed are:

  • Being able to specify a size in the ctor and have a way to have the engine use the best size for the platform, without specifying this size
  • Being able to know this size without having to instantiate an AudioWorkletProcessor and looking at output[0][0].length. This would allow authors to customize the (possibly generated at run time) code that will then go into the AudioWorkletProcessor, for example to take advantage that the buffer size is a power of two, etc.
  • Being able to know the right size (not strictly needed initially)
  • Have an event that authors can subscribe to, to know that the underlying device has changed, or maybe this already exists on MediaDevices

Should this, by any chance, also be extended with a new property in the AudioWorkletNodeOptions interface definition, renderSize, so that it can be known in the AudioWorkletProcessor constructor for setup purposes in advance? Or maybe it could be made available as a global property, similar to the global sampleRate and currentTime properties in the AudioWorkletGlobalScope.

That said though, of course it should be possible to just rip it from the used AudioContext and pass it manually by hand in the custom processor options object, but I feel like this is now an integral part of the AudioWorkletProcessor, so maybe it should be natively available in its options initializer as well.

Thoughts?

juj commented 1 year ago

How likely is it that this change will land?

Also, if this improvement lands, will the array length in AudioWorklets remain at fixed 128 samples, or will it reflect the quantum size of the context? Reading the PR suggests that it would, though I see the BitCrusher example was not changed: https://github.com/WebAudio/web-audio-api/pull/2469/files#r1117951767

Documentation like https://developer.chrome.com/blog/audio-worklet/#custom-audioparam would then benefit from updating.

padenot commented 1 year ago

This change will land for sure, implementors simply didn't have enough resources to implement it just now: it's a very large change underneath: lots of code assumes 128 frame buffer for performance reasons, e.g. for SIMD, buffer pooling and such, and a very large amount of code has to be modified and tested. But on the other hand it's so beneficial for performance that we can't not land it. It was just postponed in the roadmap, some other work items were smaller in size and also very high priority (e.g. audio output device selection).

And also yes, this will change AudioWorkletProcessor buffer sizes. But after an AudioContext is constructed, the render quantum size is always constant, even if the underlying audio device changes, which means that performance can be lowered when the audio device change (e.g. because the previous audio device has been unplugged, and the AudioContext was following the default device of the host, and both audio devices don't have the same preferred sample-rate / buffer size).

All this audio device change stuff has either already landed or is resolved or almost resolved (https://github.com/WebAudio/web-audio-api/issues/2532), so developers can decide to recreate their audio graph, or not, depending on what's best for the application.

juj commented 1 year ago

Perfect, thanks for the update!

rtoy commented 1 year ago

FWIW, I had a bunch of CLs for Chrome that implemented this. IIRC everything was working, except I had not yet handled anything having to do with FFTs. Complicated stuff, but I think I could make it work since the FFT library supported non-powers of 2 FFTs. I was going to restrict the set of buffer sizes to be lengths that were supported by the FFT. Fortunately, this included buffer sizes like 160, 192, 240, etc. that are common on Android devices.

ElizabethHudnott commented 1 year ago

Is this thread only about choosing a single render quantum size globally for the whole AudioContext? Or would it also be possible to have a subgraph that operated at some submultiple of the main quantum size, for example so that you could implement feedback FM outside of an audio worklet using a quantum size of just a couple of samples, as is possible in MaxMSP?

padenot commented 1 year ago

It's for the entire AudioContext. Feedback FM or other DSP algorithms that use very short feedback loops are better implemented using AudioWorkletProcessor.

The goal of this is to align the render quantum to what the OS uses to maximize performance and also to potentially lower latency.

haywirez commented 10 months ago

Has there been a discussion about considering adding support for changing this setting after the context is set up and running? It's a lot of added complexity, but I imagine there's a close relation to the Render Capacity API. It could be useful to change this without having to re-initialize a web app's entire audio graph, maybe while the context is suspended.

hoch commented 10 months ago

2023 TPAC Audio WG Discussion: The PR will be verified against the current head and merged.

Re: @haywirez The Working Group believes that changing the render quantum size on the fly is not practical. (As far as I can remember, no platform audio APIs support this.) Also, merging this PR does not prevent us from adding more features on top of this. Allowing this only when suspended makes sense, but it creates some interesting edge cases that one could imagine.