WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 166 forks source link

API proposal to preemptively determine audio thread overload (Render Capacity) #2444

Closed JohnWeisz closed 2 years ago

JohnWeisz commented 6 years ago

The Web Audio API defines audio production apps as a supported use-case, such as wave editors, digital audio workstations, and the like. And it's indeed quite adequate at the task!

One common trait of these applications is the capability of accepting virtually unlimited user content, which will essentially always result in hitting the limit of audio processing capabilities on a given machine at one point -- i.e. there is so many audio content that the audio thread simply cannot keep up with processing everything, it becomes overloaded, no matter how well it is optimized.

I believe this is widely and well understood in the audio production industry, and usually the solution to help the user avoid overloading is displaying a warning indicator of some kind, letting the user know the audio processing thread is about to be overloaded, and audio glitching will occur (so the user can know they should go easy on adding more contents).

Note: in native apps (mostly C++ based), this is most commonly implemented as a CPU load meter (for the audio thread's core), which you can keep your eye on to know how far you are from the limit, roughly.

Currently, the Web Audio API does not expose a comparable API to facilitate monitoring audio processing load, or overload.

It's possible to figure it out, mainly in special cases with above-web-standard privileges (such as an Electron app). However, this is quite difficult to get right (even from native C++ side) without implementations taking a spec-defined standard into consideration.

I'd wish to propose a small set of light, straightforward, low-privacy implication API additions to enable this:

audioContext.isOverloaded(); // 'true' or 'false'
audioContext.addEventListener("overloadchange", function (e) {
    e.isOverloaded; // 'true' or 'false'
});

For obvious reasons, this is an extension to the AudioContext interface, not BaseAudioContext, as overload detection is not applicable for OfflineAudioContext processing. Having an event dedicated for the same purpose avoids the need of a polling check.

It is up to implementations to decide how exactly it is determined whether the audio thread is considered overloaded or not, optionally taking the AudioContext latencyHint setting into consideration.

This would enable Web Audio API-based apps let the user know about high audio thread load, and display possible options or hints at steps for the user to take to avoid audio glitching.

Privacy implications

Exposing this information should have little to no privacy implications, as (1) it is rarely clear why exactly the audio thread is overloaded (it could be due to low device capabilities, or high CPU use by other processes), and (2) it does not provide a more accurate way to determine device capabilities than what is already possible with a simple scripted benchmark.

rtoy commented 3 years ago

The currentFrame and currentTime reflect the audio thread's internal count of frames. This is not directly correlated to the realtime. So, if something delayed processing for, say, 1 sec, currentFrame will still be incremented by just 128 for the next frame, even though 1 sec of real time has elapsed. If pefromance.now were available, you could then keep track. It's not currently available, but under consideration in WebAudio/web-audio-api#2413.

Plus, we want to be able to report this without having developers use an AudioWorkletNode if they otherwise wouldn't need one.

jacksongoode commented 3 years ago

Hi,

Just following this thread. I just discovered this render capacity is a fairly good metric for audio performance and I was wondering if there is anyway to log or record this statistic?

hoch commented 3 years ago

@jacksongoode

For Chrome: https://web.dev/profiling-web-audio-apps-in-chrome/

For Firefox: https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/

jacksongoode commented 3 years ago

@jacksongoode

For Chrome: https://web.dev/profiling-web-audio-apps-in-chrome/

For Firefox: https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/

Hmm, I've checked those out but it doesn't look like there a way to get logs of the render capacity as a percent of CPU use, is there?

I was also wondering, seeing as this follows this thread's theme, if this method of calculating CPU can be corrected for the web audio demo?

hoch commented 3 years ago

Sorry that I misunderstood your question. The idea of render capacity was originated from here: https://web.dev/profiling-web-audio-apps-in-chrome/#use-the-webaudio-tab-in-chrome-devtools (look for "Render Capacity" in the description)

Currently there's no way to use this value programmatically - and this thread is about exposing it on the platform.

I was also wondering, seeing as this follows this thread's theme, if this method of calculating CPU can be corrected for the web audio demo?

Are you asking if this API can be used for CPU benchmark? There might be correlation, but I believe the render capacity is much narrower than the generic CPU info APIs. It only cares about your audio rendering thread. (at least that's my intention so far)

Also the example from the Resonance audio uses OfflineAudioContext. The render capacity is specifically for the real-time use case.

padenot commented 3 years ago

https://github.com/oyiptong/compute-pressure/blob/main/README.md https://oyiptong.github.io/compute-pressure/

is a proposal that is related, not on any standards track, that does not replace what we're doing here, that is very much needed. It thought however that it would be useful to mention it and maybe also use it in conjunction with the API being designed here. Machine load can certainly be interesting for real-time audio apps, even if quite frequently this is using real-time threads, that are going to be scheduled ahead of others.

hoch commented 3 years ago

Glad that you brought up ComputePressure API - I vaguely remember I mentioned it in the teleconference in the past. I see a couple of major differences, but the biggest one is:

CP API is for measuring the "overall" machine load over a fixed (and relatively coarse, 1 sec) sampling interval. On the other hand, the Render Capacity specifically focuses on the workload in the audio rendering thread. One can be used as a rough proxy for another, but they do not exactly match. So yes, I agree that the Render Capacity serves its own purpose.

I can see developers take advantage of both APIs: ComputePressure for the main thread load, and Render Capacity for the audio rendering load.

hoch commented 3 years ago

Current WIP proposal for F2F discussion:

Polling

dictionary AudioRenderPerformanceOptions {
  double measuringInterval = 1;
  double smoothingCoefficient = 0.5;
}

[Exposed=Window]
interface AudioRenderPerformance {
  undefined start(AudioRenderPerformanceOptions);
  undefined stop();
  readonly attribute double capacity;
}

partial interface AudioContext {
  [SecureContext] readonly attribute AudioRenderPerformance renderPerformance;
}

Example

context.renderPerformance.start();

const pollCapacity = (timestamp) => {
  const cap = context.renderPerformance.capacity;
  doSomethingWithRenderCapacity(cap, timestamp);
};

requestAnimationFrame(pollCapacity);

Event-based

dictionary AudioRenderPerformanceOptions {
  double openThreshold = 0.95;
  double closeThreshold = 0.75;
}

[Exposed=Window]
interface AudioRenderPerformanceEvent : Event {
  constructor (DOMString type, boolean aboveThreshold);
  readonly attribute boolean aboveThreshold;
}

[Exposed=Window]
interface AudioRenderPerformance {
  undefined start(AudioRenderPerformanceOptions);
  undefined stop();
  attribute EventHandler onthresholdcrossing;
}

partial interface AudioContext {
  [SecureContext] readonly attribute AudioRenderPerformance renderPerformance;
}

Example

context.renderPerformance.addEventListener('thresholdcrossing', (event) => {
  if (event.aboveThreshold) {
    reduceWork();
  } else {
    addMoreWork();
  }
});

context.renderPerformance.start({openThreshold: 0.9, closeThreshold: 0.8});
hoch commented 3 years ago

F2F: The majority was in favor of the polling approach, but with more meaningful properties.

dictionary AudioRenderPerformanceOptions {
  double measuringInterval; // in seconds
  double smoothingCoefficient;
}

[Exposed=Window]
interface AudioRenderPerformance {
  undefined start(AudioRenderPerformanceOptions);
  undefined stop();
  readonly attribute double averageCapacity;
  readonly attribute double maxCapacity;
  readonly attribute double underruns;
}

partial interface AudioContext {
  [SecureContext] readonly attribute AudioRenderPerformance renderPerformance;
}

Also note that renderPerformance object is protected by SecureContext and without user gesture (autoplay) these values will be all zeros.

svgeesus commented 3 years ago

This Compute Pressure API proposal seems related (see TAG Review)

hoch commented 3 years ago

Updated the proposal above to reflect the discussion on 6/3.

rtoy commented 3 years ago

Teleconf: @hoch mentioned that the API may change a bit due to input from Google's privacy team. Instead of polling, it may be event-based like rAF. Details still need to be worked out.

hoch commented 3 years ago

Here's the event-based API proposal:

dictionary AudioRenderPerformanceOptions {
  double updateInterval = 1;
}

[Exposed=Window]
interface AudioRenderPerformanceEvent : Event {
  constructor (DOMString type, double timestamp,
               double averageCapacity, double maxCapacity, double underrunRatio);
  readonly attribute double timestamp;
  readonly attribute double averageCapacity;
  readonly attribute double maxCapacity;
  readonly attribute double underrunRatio;
}

[Exposed=Window]
interface AudioRenderPerformance {
  undefined start(AudioRenderPerformanceOptions);
  undefined stop();
  attribute EventHandler onupdate;
}

partial interface AudioContext {
  [SecureContext] readonly attribute AudioRenderPerformance renderPerformance;
}

AudioRenderPerformanceOptions

AudioRenderPerformanceEvent

AudioRenderPerformance

AudioContext.renderPerformance

AudoioContext’s playback is controlled by the autoplay policy and cannot be automatically played (that is, suspended) without an user interaction. (e.g. explicit clicking on DOM)

hoch commented 3 years ago

@pmlt @jackschaedler @andrewmacp What do you think about the latest API proposal? Any feedback would be greatly appreciated!

pmlt commented 3 years ago

This feature isn't necessary for game development, however it seems to me that if I was writing a web-based DAW this API would give me a sufficient approximation to have a CPU usage meter. Looks good to me!

hoch commented 3 years ago

I thought the game audio engine would want to monitor the current render capacity, so it can dynamically control the application load. Still, thanks for the feedback!

pmlt commented 3 years ago

We definitely do, but since we use a WebWorker/SAB/AudioWorklet architecture, we immediately detect underruns at the start of the AudioWorklet callback simply by inspecting the number of audio frames produced by the WebWorker since the last callback. No need for an event-based approach in this case.

hoch commented 3 years ago

Ah, I see. That's actually a good point. I understand that you can implement the glitch detection within AudioWorkletProcessor - but would it be useful to have a built-in detection/monitoring feature on AWGS?

ulph commented 3 years ago

@hoch perhaps you can clarify a detail here.

Given

AudioRenderPerformanceOptions

  • updateInterval: For example, 375 values will be collected with 1 second interval at 48Khz sample rate and 128 frames of render quantum.

AudioRenderPerformanceEvent

  • Note that For averageCapacity, maxCapacity, and underruns, the range is between 0.0 and 1.0 and its precision is limited to 1/100th. (21 bits of entropy total)

What is the rounding strategy?

Consider using the default parameters, as you outline that'd be 375 events per second. If there is just a single underrun the ratio would be 1/375. What would the resulting reported ratio float be? I would assume 0, ie that single underrun would go undetected.

Assuming then small ratios get rounded to zero, the counter to that would be to calculate update interval given samplerate (and, soon the render quantum size).

As an alternative to this, why not just report the total number of frames with underruns as well as the total number of frames for the given interval?

EDIT: rewrote a sentence

hoch commented 3 years ago

What would the resulting reported ratio float be? I would assume 0, ie that single underrun would go undetected.

Good point! We need to develop a rounding strategy to avoid the situation. A clear distinction between zero and non-zero would solve this issue. For example, 0 would be 0, but 1/375 would be 0.01. Some details need to be fleshed out, so thanks for raising this question!

As an alternative to this, why not just report the total number of frames with underruns as well as the total number of frames for the given interval?

The bucketed approach helps us lower the privacy entropy and reporting the exact number of underruns is something that we want to avoid. My goal is to maintain the privacy entropy as low as possible while keeping it useful to developers.

as well as the total number of frames for the given interval?

Unfortunately this is sort of already exposed via AudioContext.baseLatency. Since this number is very platform specific, so it adds more bits to the fingerprinting information. I don't think we want to duplicate the info through this new feature.

jackschaedler commented 3 years ago

@hoch

This looks pretty good to me!

Like @ulph, I was also wondering about the way that the underruns value is reported... It would be great to avoid a situation where underruns is reporting 0.0 but maxCapacity is reporting 1.0. That seems like the only case where using this API would be a bit confusing.

This makes me wonder if there's simply a better term than underruns for this concept. Maybe if that field was named something like underrunRatio or capacityExceededRatio or something like that, the 0-1 value range and precision choices would feel more natural.

A few nitpicks if this exact language ends up being used:

hoch commented 3 years ago

Fixed nits. Thanks!

underrunRatio is (where N is the number of render quanta per interval period, u is the number of underruns per interval period):

I will think about if we need the same treatment for the average and max capacity, and feel free to chime in if you have any suggestions.

andrewmacp commented 3 years ago

@hoch This looks great!

Just out of curiosity is the lower bound on the updateInterval 1s or can it be set even lower?

hoch commented 3 years ago

@andrewmacp 1 is a placeholder value, but I think it's a reasonable default. How much lower do we want to go? and why?

Another example nearby would be ComputePressure API and it also has a rate-limited event:

"The callback is intentionally rate-limited, with the current implementation maxing out at one invocation per second for a page in the foreground, and once per ten seconds for background pages."

andrewmacp commented 3 years ago

@hoch I was mostly just curious here, I think the underrunRatio should give us enough of what we need but was wondering if we can simply set the updateInterval to a value that would result in <=100 renders per callback in order to get a specific number of underruns. But then that would defeat the purpose of using a ratio to begin with so was wondering if the plan was to limit the updateInterval to a 1s minimum like in the ComputePressure API.

ulph commented 3 years ago

@hoch noted, your suggested rounding scheme would work.

  • 0.01 if u / N value is greater than 0.0 and less than equal to 0.01.
  • Otherwise it's u / N rounded up to the nearest 100th.

doesn't hurt to extra clear I suppose - but the "ceil to hundredth" would cover the 0 < x < 0.01 case as well?

I will think about if we need the same treatment for the average and max capacity, and feel free to chime in if you have any suggestions.

I don't think the rounding is as problematic for the capacity ones - the issue with the underrun ratio was that (repeated) single underruns can be quite detrimental, esp. consider the case of occasional single glitches occurring every few seconds.

Some ideas for capacity though; does it make sense to throw some more statistical numbers in there? Like min and stdev? That would give some more indication of distribution of the capacity measurements during the measurement window. (Personally I would have pondered a histogram but I suspect that's not everyones cup of tea)

hoch commented 2 years ago

Re: @andrewmacp

But then that would defeat the purpose of using a ratio to begin with so was wondering if the plan was to limit the updateInterval to a 1s minimum like in the ComputePressure API.

I believe 1s is sensible default. I don't have a strong opinion on its lower boundary as long as it's reasonably coarse. (e.g. 0.5 second or higher) This is up for discussion.

Re: @ulph

"ceil to hundredth" would cover the 0 < x < 0.01 case as well?

Thanks! That's better.

Some ideas for capacity though; does it make sense to throw some more statistical numbers in there? Like min and stdev? That would give some more indication of distribution of the capacity measurements during the measurement window. (Personally I would have pondered a histogram but I suspect that's not everyones cup of tea)

IMO those (min, stddev, histogram) are in the scope of "good-to-have". I suggest we deliver the first cut with the most essential data, and extend it later when it's absolutely needed.

alvestrand commented 2 years ago

The actual problem shouldn't be the CPU overload, it should be processing not completing within the cycle time - no matter what the reason is, such overruns are a problem. In the WebRTC context, we typically solved this with counters - a counter of how many samples have been processed and of how many samples did not finish processing in its cycle should be sufficient; if the last one stirs above zero, you know you're in trouble.

hoch commented 2 years ago

it should be processing not completing within the cycle time

Yes. This is basically how the "averageCapacity" is calculated. This capacity might be related with the CPU usage to a certain degree, but they clearly have different meanings.

TheSlowGrowth commented 2 years ago

a counter of how many samples have been processed and of how many samples did not finish processing in its cycle should be sufficient

This is something we can already do right now, using audio worklets. The problem is that you only get to know when you're already overloading - there's no way to see how much headroom you still have before overloading would occur. For us this is critical, we want to get a solid understanding of how much headroom for additional processing we have on the clients machines.

I suppose the better way is to measure the raw execution time for the rendering of one audio block. Since the sample rate and block size is known, this should give a direct "load percentage".

padenot commented 2 years ago

I suppose the better way is to measure the raw execution time for the rendering of one audio block. Since the sample rate and block size is known, this should give a direct "load percentage".

Just a note here that, additionally to @hoch proposal here, https://github.com/WebAudio/web-audio-api/issues/2413 can be of interest to you if you want to record time yourself. The current status is that it's going to be added (see my last message there).

The feature in this issue is still nice to have a more global load metric including native nodes, that can be significant in terms of load (HRTF panning, Convolver, but also an accumulation of cheaper node for a complex processing graph, etc.).

During development (meaning that it doesn't eliminate the need for the two things just mentioned), both Firefox and Chrome have profiling capabilities that allow drilling down and getting the execution time of each process calls, and depending on the browser, other metrics such as a histogram of callback time, etc.

https://web.dev/profiling-web-audio-apps-in-chrome/ and https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/ for Chrome and Firefox, respectively.