Add AudioRenderCapacity interface and related classes

hoch commented 2 years ago

Fixes #2444.

The PR is ready for rebase/merge.

samuelweiler commented 2 years ago

Please add an analysis of how much fingerprinting surface this adds to the spec's privacy considerations section. If you think there's none, justify that claim.

hoch commented 2 years ago

Thanks for chiming in, @samuelweiler.

The real-time AudioContext is gated by the autoplay policy. Without an explicit user activation, the audio renderer does not run at all and thus this API will return 0 for all properties. (https://webaudio.github.io/web-audio-api/#allowed-to-start)
OfflineAudioContext (which is quite popular for fingerprinting) does NOT have this API.
This API metric only represents the computational load of the audio thread, and its site-specific characteristics can be widely different. (e.g. a web-based DAW and a Google doodle)
We prefer the event-based pattern over the polling pattern because it helps avoid drive-by/automatic fingerprinting. 100 buckets (i.e. 0.00 ~ 0.99) for averageCapacity, maxCapacity and exposing the number of buffer underruns are also sensible from the privacy perspective, and it satisfies the developer’s’ needs.

hoch commented 2 years ago

My response to https://github.com/WebAudio/web-audio-api/pull/2462#pullrequestreview-861034880:

How does this work when the buffer size of the device is not aligned on the render quantum size? In its current shape, this proposal only measure the time it took to render a block, and the duration of this block.

It seems more useful to measure the time it took to render a system-level audio callback divided by the buffer size of this system-level audio callback, divided by the sample-rate of this system-level audio stream. Or maybe I'm reading too much when it's written "render quanta", and it's not the same as our "render quanta" in the spec, in which case we probably want to use another term. That may be what "raw capacity value" means.

I also noticed this problem - my approach was to avoid mentioning the underlying callback buffer size and how it is connected to the rendering algorithm. The reason is that the concept is not formally specified. (We only have 2 instances of "system audio callback", and we don't explain or define how it works)

Instead, the current language only cares about the render quantum and the user-specified time interval. We can perhaps clarify by introducing a new section that explains how the Web Audio renderer works with the underlying audio system, but 1) that alone can be a significant spec work and 2) there might be some UA-specific details in this. Not sure if we want this.

An example to make it clear what I mean:

In the (common) case of a buffer size that is 192 frames at 48000 Hz (precisely 4 ms), if we have a graph that has a render quantum of 128 frames / 2.66ms (the default in all implementations for now) and renders in an average of 2ms, it's ceil(2 / 2.66 * 100) = 76.

Yes, but do 192 frames matter? The render quantum and a load value is enough information.

In reality, every other callback, 256 frames of audio need to be rendered, so about 5.32ms (excluding non-linearity effects, I think the point still stands), which underruns. So we're in a situation where the load is lower than 100, but underruns are clearly happening.

I am not following this point - why 256 frames are needed every other callback?

In general, I think we have an agreement on design - but perhaps we are not sure how to solve the uncertainty around the underlying callback, its interval, and how the "batch" rendering mechanism.

padenot commented 2 years ago

I am not following this point - why 256 frames are needed every other callback?

I've tried to make it clear in the link, via a table: since the 128 is not a divisor of 192, when a callback arrives, with a time budget of 4ms (at 48kHz), 256 frames need to be rendered (to have 192 frames to give to the system callback), which means 64 frames are left in a buffer in the browser. Then the next callback comes, 192 frames are needed again, with the same time budget of 4ms. There are 64 frames left in the buffer, so the web browser only has to render 128 frames (128 + 64 = 192) to service the system audio callback. It repeats like this for the lifetime of the AudioContext.

So every callback, with a 4ms budget, we have either 256 or 128 frames to render. This means that an effective load histogram should be bimodal, but it isn't with the current proposed spec text. This is important because it signals to authors that there is a problem and that the maximum load they can have on this configuration is half the maximum theoretical load.

I think it's perfectly workable with the current API shape, though. But in general, I'm afraid that not taking into account the realities of the underlying system will yield an API that is less useful than it could be.

If we shift the API to collect data on the system-level audio callbacks instead of render quanta, it immediately becomes a lot more useful for the user. There is in fact no need for render quanta (in the Web Audio API sense of a block of, for now, 128 frames) to be mentioned.

hoch commented 2 years ago

Conclusion from WG teleconf on 1/27/2022:

Agreed that we need a new section describes how the underlying platform (or system) audio callback drives the WebAudio renderer. We can take advantage of this new section to describe the render capacity interface and also the user-selectable render quantum feature.
Adding the new section above does not block this PR. Revise the current PR based on the new concept now and we can add the new section afterwards.
Revise the calculation process based on the new section. Some new terms need to be defined:
- Collection period: the number of frames of a system-level audio callback, divided by the system sample-rate.
- Underrun ratio: the number of underruns over the number of collection periods.

hoch commented 2 years ago

@padenot @svgeesus Do we need a TAG review for this?

svgeesus commented 2 years ago

Do we need a TAG review for this?

Need, no.

However, especially if there is an explainer for this, requesting review would be helpful at this early stage. Or we could wait until the Recommendation is republished with the next batch of new features.

hoch commented 2 years ago

@samuelweiler Please take a look at https://github.com/WebAudio/web-audio-api/pull/2462#issuecomment-1011333801 before we merge this change.

WebAudio / web-audio-api

Add AudioRenderCapacity interface and related classes #2462