WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 165 forks source link

Clarification: getOutputTimestamp() vs outputLatency #2461

Closed chcunningham closed 2 years ago

chcunningham commented 2 years ago

Reading the spec, I expect the following to always evaluate to true. Am I correct? For cases where the values are expected to change (e.g. connecting blue tooth headphones), can I expect the condition to remain true?

context.outputLatency * 1000 == performance.now() - context.getOutputTimestamp().performanceTime

Chrome hasn't yet shipped outputLatency, but I tested this in Firefox (using this page) and the condition is true.

If I have this right, the conclusion is that these fields are redundant. Totally fine with me, but a little surprising. Is there some history that explains why we have both?

Aside: it's likely that context.getOutputTimestamp() has a bug in Chrome. I get a totally different value from Firefox on the same system. Also, On Chrome the value appears to bounce around (no change to my setup) while the Firefox value remains steady.

@hoch @padenot

padenot commented 2 years ago

It's only true if the performance.now() call and the context.getOutputTimestamp().performanceTime are made in quick succession, faster than the clock resolution of the system it's running on.

Those calls are more or less redundant. outputLatency is the most useful piece of information here for anything non-trivial.

fwiw, Firefox's implementation, that looks very similar to what you wrote: https://searchfox.org/mozilla-central/source/dom/media/webaudio/AudioContext.cpp#584-591.

Note that depending on the Firefox version you're running on, and depending on various factor including the status of the process isolation feature, some timestamps are quantized, and so the condition you've written with always be true regardless of the machine. This won't necessarily be the case when those restrictions are lifted because other mitigations are in place (COEP/COOP, process isolation, etc.), but (iirc) those can remain in place when Firefox is running in "reduce fingerprinting" mode.

chcunningham commented 2 years ago

Thanks. Totally answered my question :)

chcunningham commented 2 years ago

I think I may have read the spec wrong (and I think firefox might have a bug). @padenot @hoch PLS LMK

Re-reading here with some added emphasis: https://webaudio.github.io/web-audio-api/#dom-audiocontext-getoutputtimestamp

getOutputTimestamp()

Returns a new AudioTimestamp instance containing two related audio stream position values for the context: the contextTime member contains the time of the sample frame which is currently being rendered by the audio output device (i.e., output audio stream position), in the same units and origin as context’s currentTime; the performanceTime member contains the time estimating the moment when the sample frame corresponding to the stored contextTime value was rendered by the audio output device, in the same units and origin as performance.now() (described in [hr-time-3]).

In other words

After the context’s rendering graph has started processing of blocks of audio, its currentTime attribute value always exceeds the contextTime value obtained from getOutputTimestamp method call.

Makes sense, context.currentTime > getOutputTimestamp().contextTime because of output latency. The phrasing always might be too strong. If we suspend the AudioContext I would expect getOutputTimestamp().contextTime to eventually match context.currentTime (currentTime ceases to advance, so the contextTime catches up to it as the OS plays through its buffer).

Note: The difference between the values of the context’s currentTime and the contextTime obtained from getOutputTimestamp method call cannot be considered as a reliable output latency estimation because currentTime may be incremented at non-uniform time intervals, so outputLatency attribute should be used instead.

Reliability aside, I think this aligns conceptually with my interpretation above. We should generally (but not always) expect:

context.outputLatency == context.currentTime - getOutputTimestamp().contextTime

chcunningham commented 2 years ago

Re Firefox,

The condition context.outputLatency == context.currentTime - getOutputTimestamp().contextTime does hold when I test locally (0.04493750000000318 ~= 0.0449375).

I think the only bug is just how getOutputTimestamp().performanceTime is calculated.

hoch commented 2 years ago

Thanks - this thread actually helps me understand getOutputTimestamp() method better.

I think the only bug is just how getOutputTimestamp().performanceTime is calculated.

Is it a bug in the current spec language? Or something else?

chcunningham commented 2 years ago

Is it a bug in the current spec language? Or something else?

At this point I think the spec is probably right and the bug is in Firefox implementation of performanceTime. But I definitely welcome folks to check my reasoning :)

padenot commented 2 years ago

All this is essentially to be able to work with clock drifts, and maybe to do client-side clock interpolation, but I'm not really sure of the intent at the time.

First, performance.now() doesn't increase at the same rate as AudioContext.currentTime (it can, but it doesn't have to). Having both can be useful to understand the drift.

performance.now() also typically a resolution that's fairly fine. AudioContext.currentTime is increased block-by-block. getOutputTimestamp().contextTime might have intended to be AudioContext.currentTime + the time elapsed since AudioContext.currentTime was last incremented, using a finer-resolution clock, offset by the latency. This is useful to know more or less what is being output right now, e.g. for A/V sync.

This is often called client-side audio stream clock interpolation, and is useful in particular if the block-size is big, i.e. if AudioContext.currentTime progresses by large increments. It's roughly 2-3ms in the default Web Audio API case (128 frames at 44.1kHz is more or less the worst case, roughly 2.9ms). It's common for OS audio stack to do this (OpenSL ES, PulseAudio, WASAPI do this, at least, and e.g. Firefox implements something like this on macOS and AAudio).

getOutputTimestamp().performanceTime is the time, in the performance.now(), at which the sample at getOutputTimestamp().contextTime is being rendered. This can be useful to match with requestAnimationFrame or something like that.

If the context’s rendering graph has not yet processed a block of audio, then getOutputTimestamp call returns an AudioTimestamp instance with both members containing zero.

This is weird, because a large increase will happen during the initial "suspended" -> "running" state transition, but why not after all, those values don't make much sense if the underlying audio stream isn't running.

After the context’s rendering graph has started processing of blocks of audio, its currentTime attribute value always exceeds the contextTime value obtained from getOutputTimestamp method call.

This is consistent with the above. AudioContext.currentTime is incremented before handing the audio to the OS. It means that it's always slightly in the future (i.e. greater) compared to other clocks. max(0, AudioContext.currentTime - AudioContext.outputLatency) is (with a coarse resolution), the time of the sample that is being output physically. AudioContext.getOutputTimestamp().contextTime will be the same with a finer resolution, and AudioContext.getOutputTimestamp().performanceTime will be the same in the clock domain of performance.now(). Firefox doesn't implement this, but certainly could.

We can decide what is the most useful now and fix implementations. Generally, we can make this more useful that just being redundant with outputLatency. Because we implement this in Gecko, I think we'd agree with implementing it, because it's useful (it's even more useful when syncing with high framerate animations, which are becoming more common).

hoch commented 2 years ago

Thanks for the detailed explanation, padenot@!

max(0, AudioContext.currentTime - AudioContext.outputLatency) is (with a coarse resolution), the time of the sample that is being output physically. AudioContext.getOutputTimestamp().contextTime will be the same with a finer resolution

Just curious - why one is coarse whereas the other is finer?

We can decide what is the most useful now and fix implementations.

Agreed. We can open a new spec issue for revising getOutputTimestamp() API.

padenot commented 2 years ago

Just curious - why one is coarse whereas the other is finer?

AudioContext.currentTime is defined, per spec, to be a sum of render quantum sizes (for now always 128), each time a block is rendered. In practice (depending on the implementation), it's common that it's incremented in multiples of 128, e.g. Windows/WASAPI frequently renders 440 audio frames in one system-level audio callback so there's going to be 3 or 4 (384 or 512 frames, on odd/even calls) render quantum in a very short succession (considering the load itself is reasonnable), and then no changes for some time until the next audio callback, and so on.

We can make AudioContext.getOutputTimestamp().contextTime to smoothly progress by interpolating between those jumps with a precise system clock -- this won't affect the overall slope of the audio clock, it's just interpolated with a system clock in between system-level audio callbacks. We don't have to but we find it useful.

hoch commented 2 years ago

Do we have any lingering questions here? Otherwise I'll close this soon.