WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 167 forks source link

Audio nodes should expose their intrinsic latency. #469

Open cristiano-belloni opened 9 years ago

cristiano-belloni commented 9 years ago

Stackoverflow thread here: http://stackoverflow.com/questions/25807887/using-an-offline-context-in-the-web-audio-api-shift-the-signal-of-264-samples

Compression, panning and waveshaping add latency in the current Chrome and Firefox implementation. Filters and gains do not.

You can see the behaviour here: https://wav.hya.io/#/fx/node_compressor - If you add a wave and then press the "Apply" button, the Compressor effect (based uniquely on a CompressorNode) adds a shift, which is clearly visible if you zoom down your waveform. I can provide a plnkr example, if needed. If you use a Filter (example: lowpass - https://wav.hya.io/#/fx/lowpass_filter), this does not happen.

@cwilso says:

The behavior of the nodes isn't strictly defined by the spec (yet), so it's not well-defined. I'd expect an HRTF panner to add some latency, though. Not positive about compression.

so I thought it was worth to post it here.

I believe the nodes should automatically handle the shift by spec, when in an offline context (while it's inevitable in a real-time context). Alternatively, there should be a way to calculate the amount of shift each node applies, and counteract it application-side.

rtoy commented 9 years ago

On Thu, Jan 29, 2015 at 6:18 AM, Cristiano Belloni <notifications@github.com

wrote:

Stackoverflow thread here: http://stackoverflow.com/questions/25807887/using-an-offline-context-in-the-web-audio-api-shift-the-signal-of-264-samples

Compression, panning and waveshaping add latency in the current Chrome and Firefox implementation. Filters and gains do not.

​I think waveshapers should only add latency when you oversample. Otherwise, there shouldn't be any latency.

Filters also add latency​,typically pretty small.

I think compressors add latency. @hongchan can say more about that for Chrome's implementation.

The HRTF panner (now no longer the default) also adds a small latency of about 64 samples, I think. This is what Chrome does. I think it also depends on the sample rate of the context.

You can see the behaviour here: https://wav.hya.io/#/fx/node_compressor - If you add a wave and then press the "Apply" button, the Compressor effect (based uniquely on a CompressorNode) adds a shift, which is clearly visible if you zoom down your waveform. I can provide a plnkr example, if needed. If you use a Filter (example: lowpass - https://wav.hya.io/#/fx/lowpass_filter), this does not happen.

@cwilso https://github.com/cwilso says:

The behavior of the nodes isn't strictly defined by the spec (yet), so it's not well-defined. I'd expect an HRTF panner to add some latency, though. Not positive about compression.

so I thought it was worth to post it here.

I believe the nodes should automatically handle the shift, when in an offline context (while it's inevitable in a real-time context). Alternatively, there should be a way to calculate the amount of shift each node applies, and counteract it application-side.

​I am opposed to this. It makes OfflineContexts much more difficult to use for testing nodes, and makes the output differ from what an online context would produce.

Also, consider the latency from a filter. This latency can dynamically change via an AudioParam for the cutoff frequency, and/or Q parameter. How would you handle that?​

Ray

cristiano-belloni commented 9 years ago

​I am opposed to this. It makes OfflineContexts much more difficult to use for testing nodes, and makes the output differ from what an online context would produce.

I understand. But the other way makes offline context applications (i.e. in-place editing of a sound wave, applying an effect to a subportion of a signal) completely unreliable, with no way to counteract this effect application-side. Is really the choice between real world applications vs node testability?

Also, consider the latency from a filter. This latency can dynamically change via an AudioParam for the cutoff frequency, and/or Q parameter. How would you handle that?

I never tried to set automations on offline audio contexts, honestly. Could you please explain this a bit more? I thought filters were implemented like this, so we can always map a y(n) to an x(n).

rtoy commented 9 years ago

On Thu, Jan 29, 2015 at 9:31 AM, Cristiano Belloni <notifications@github.com

wrote:

​I am opposed to this. It makes OfflineContexts much more difficult to use for testing nodes, and makes the output differ from what an online context would produce.

I understand. But the other way makes offline context applications (i.e. in-place editing of a sound wave, applying an effect to a subportion of a signal) completely unreliable, with no way to counteract this effect application-side. Is really the choice between real world applications vs node testability?

​Of course you can counteract this yourself. Create your graph and replace your source with an impulse. Run it and look at the output for the impulse. The impulse is probably no longer a simple impulse, so you'll have to decide where the output impulse is. Now you know what the latency of your graph is, so you can do the appropriate thing with your actual source.

Consider also if you have a complicated cycle in your graph. Computing the latency of that graph could be quite messy.

Also, consider the latency from a filter. This latency can dynamically change via an AudioParam for the cutoff frequency, and/or Q parameter. How would you handle that?

I never tried to set automations on offline audio contexts, honestly. Could you please explain this a bit more? I thought filters were implemented like this http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt, so we can always map a y(n) to an x(n).

​Automation would be the same in an offline context as for an online context. If f is your filter node, you can do something like f.Q.linearRampToValueAtTime(,

The filters are implemented almost as in your link, but we are going to change them sligtly because no implementation actually uses those formulas. But yes, you can map a y(n) to an x(n). If you feed an impulse into a filter you'll get a somewhat delayed fuzzy impulse out. It's kind of hard to know what you use as the latency, but the impulse is definitely delayed.​

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/469#issuecomment-72067618 .

Ray

hoch commented 9 years ago

rtoy just made the almost same comment, but I leave mine here as well.

@janesconference If we have a linear and single stream structure like a channel strip in DAW, we can certainly calculate the delay and try to compensate it across multiple channels, but it is impossible to do that because the topology of audiograph can be arbitrary.

I think compressors add latency.

Chrome's DynamicCompressor has variable predelay buffer (256~1024 sample) for lookahead and adaptive release.

rtoy commented 9 years ago

While I'm opposed to having an OfflineAudioContext do the latency computation, I am amenable to adding a latency attribute to nodes so that the user can compute the total graph latency. I think all nodes have a reasonably well-defined and computable latency value, but convolver nodes are particularly troublesome. I'm not exactly sure how to compute the latency when you can specify completely arbitrary responses.

And if the latency of every node is not completely specifiable, the utility of a latency attribute is greatly diminished.

On Thu, Jan 29, 2015 at 9:54 AM, Hongchan Choi notifications@github.com wrote:

rtoy just made the almost same comment, but I leave mine here as well.

@janesconference https://github.com/janesconference If we have a linear and single stream structure like a channel strip in DAW, we can certainly calculate the delay and try to compensate it across multiple channels, but it is impossible to do that because the topology of audiograph can be arbitrary.

I think compressors add latency.

Chrome's DynamicCompressor has variable predelay buffer (256~1024 sample) for lookahead and adaptive release.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/469#issuecomment-72071395 .

Ray

adelespinasse commented 9 years ago

As a programmer who uses the Web Audio API, I would hate it if an OfflineAudioContext with a certain configuration of nodes didn't give the same results as a regular AudioContext with the same configuration (modulo maybe a different number of silent samples at the beginning). Not just for testing purposes, though that's important. It would also be really confusing, and it would break use cases such as "Render my complete composition, which I've previously been listening to in real time, to a file for distribution".

(I could see having a way to do higher quality, more computationally intense rendering offline, in ways other than just a higher sample rate, but I'd want that to be optional, and preferably an option that could be applied to offline or online contexts.)

It would be nice to be able to find out the latency of nodes, but the concept doesn't really apply in general for all nodes. Biquad filters combine many input samples to calculate each output sample. They have a group delay, but that's frequency dependent. Latency can also change over time, either suddenly or continuously. So it's not a simple matter of each node having a function to return its latency in samples.

cristiano-belloni commented 9 years ago

As a programmer who uses the Web Audio API, I would hate it if an OfflineAudioContext with a certain configuration of nodes didn't give the same results as a regular AudioContext with the same configuration (modulo maybe a different number of silent samples at the beginning). Not just for testing purposes, though that's important. It would also be really confusing, and it would break use cases such as "Render my complete composition, which I've previously been listening to in real time, to a file for distribution".

Actually, real DAWs render latency free, and well behaved plugins (at least, VST ones) report their intrinsec latency to help the DAW perform the delay calculations (I think it's called Delay Compensation, but my VST days are far). If we compare the browser with a DAW / Host and the nodes with plugins (which is implied by the performance + render flow you mentioned), it's natural for nodes to signal the lag they introduce.

Latency can also change over time, either suddenly or continuously. So it's not a simple matter of each node having a function to return its latency in samples.

I remember that VST plugin solve this problem by reporting a change in latency whenever it changes. Maybe an event would help in this case?

cwilso commented 9 years ago

If you're asking for each node to expose its intrinisic reported latency, that's a rational request (I think we might have that somewhere in the issues list). Just saying we should go redo all the nodes when rendering offline to compensate for that latency is a different story, though, and I have to agree that I do not think that is a good thing to do at all.

On Fri, Jan 30, 2015 at 7:41 AM, Cristiano Belloni <notifications@github.com

wrote:

As a programmer who uses the Web Audio API, I would hate it if an OfflineAudioContext with a certain configuration of nodes didn't give the same results as a regular AudioContext with the same configuration (modulo maybe a different number of silent samples at the beginning). Not just for testing purposes, though that's important. It would also be really confusing, and it would break use cases such as "Render my complete composition, which I've previously been listening to in real time, to a file for distribution".

Actually, real DAWs render latency free, and well behaved plugins (at least, VST ones) report their intrinsec latency to help the DAW perform the delay calculations (I think it's called Delay Compensation, but my VST days are far). If we compare the browser with a DAW / Host and the nodes with plugins (which is implied by the performance + render flow you mentioned), it's natural for nodes to signal the lag they introduce.

Latency can also change over time, either suddenly or continuously. So it's not a simple matter of each node having a function to return its latency in samples.

I remember that VST plugin solve this problem by reporting a change in latency whenever it changes. Maybe an event would help in this case?

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/469#issuecomment-72100061 .

cristiano-belloni commented 9 years ago

@cwilso: yes, I understand your point. I'm not familiar with the internals, and posting on this thread helps me understanding them a bit more. The "redo the node output" was more a non-informed hypotesis rather than a proposal. Nodes exposing their latency and eventually signaling a change in it would be much better. Hey, it could also be a feature of Audio Workers (if they buffer and introduce a latency, the user has to expose it).

cristiano-belloni commented 9 years ago

I'm not exactly sure how to compute the latency when you can specify completely arbitrary responses.

In this case, it's probably responsibility of who sets the impulse response (the application / the user itself) to calculate the latency of it. I think nodes should only expose their intrinsic latency, and not try to calculate the latency of an external impulse response.

rtoy commented 9 years ago

I think if a latency (delay) attribute is added, it should be done for all native nodes without exception. AudioWorkers, of course, have to provide their own values.

On Fri, Jan 30, 2015 at 3:08 AM, Cristiano Belloni <notifications@github.com

wrote:

I'm not exactly sure how to compute the latency when you can specify completely arbitrary responses.

In this case, it's probably responsibility of who sets the impulse response (the application / the user itself) to calculate the latency of it. I think nodes should only expose their intrinsic latency, and not try to calculate the latency of an external impulse response.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/469#issuecomment-72186011 .

Ray

rtoy commented 9 years ago

See also issue #322.

rtoy commented 9 years ago

This issue was discussed in today's Audio Working Group teleconference. It was agreed that it can be done by each implementation. The main issue was how to handle the latency for nodes where the latency can change. For example, the latency of a biquad filter can change dynamically if the filter parameters are dynamically changing. Hongchan also mentioned that the DynamicsCompressorNode has a variable delay depending on the release parameter.

Paul suggested that if it really matters, the developer should just go a measure it using an OfflineAudioContext.

Because of these issues (and lack of time), no decision was made.

rtoy commented 9 years ago

TL;DR computed latency may not be what you want anyway.

In a past life, I did radio simulations. On one project we had a model of a real filter. I computed the group delay of the filter to compute the latency. (This is probably what a biquad filter would do for its latency). The output of the filter was fed to a synchronizing block to determine where the signal started so it could be decoded. It turned out that the synchronizer almost never computed a latency that matched the computed filter latency. I don't remember how far off, but far enough that the computed latency could not be used as the "true" latency.

Another example. Consider two biquad filters with one in a feedback loop. The delay of the loop is not easily related to the delay in either filter.

joeberkovitz commented 9 years ago

Resolution: for now, document best practice informatively as using OfflineAudioContext to measure de facto latency of an impulse or step response. Revisit later.

joeberkovitz commented 9 years ago

Note: open new linked issue in V.next when this is closed

cwilso commented 9 years ago

Related: #414.

joeberkovitz commented 9 years ago

F2F Resolution: we're going to mention the existence of latency for nodes that have it (DelayNode, WaveShaperNode with oversampling, etc.) but leave unquantified and unexposed via the API for V1.

joeberkovitz commented 7 years ago

Do not close based on #1058; instead, roll over to v.next milestone.

rtoy commented 7 years ago

Based on https://github.com/WebAudio/web-audio-api/issues/469#issuecomment-257011210, remove the PR review label and V1 and add v.next.

joeberkovitz commented 7 years ago

Thank you @rtoy

padenot commented 6 years ago

F2F Resolution: We decided to not expose the latency of nodes because:

cvanwinkle commented 2 years ago

Please reconsider this request. There's a lot of precedent which answers the questions above.

Discussion This property reflects the delay between when an impulse arrives in the input stream vs. output stream. This should reflect the delay due to signal processing (e.g. FFTs), not as an effect (e.g. reverberation). Note that a latency that varies with parameter settings, including bypass, is generally not useful to hosts. A host is usually only prepared to add delays before starting to render and those delays need to be fixed. A variable delay would introduce artifacts even if the host could track it. If an algorithm has a variable latency, it should be adjusted upwards to some fixed latency within the audio unit. If for some reason this is not possible, then latency could be regarded as an unavoidable consequence of the algorithm and left unreported (i.e. a value of 0).

The returned value defines the group delay or the latency of the plug-in. For example, if the plug-in internally needs to look in advance (like compressors) 512 samples then this plug-in should report 512 as latency. If during the use of the plug-in this latency change, the plug-in has to inform the host by using IComponentHandler::restartComponent (kLatencyChanged), this could lead to audio playback interruption because the host has to recompute its internal mixer delay compensation. Note that for player live recording this latency should be zero or small.

/*!
 *  \brief CALL: Returns the most recent signal (algorithmic) latency that has been
 *  published by the plug-in
 *  
 *  This method provides the most recently published signal latency. The host may not
 *  have updated its delay compensation to match this signal latency yet, so plug-ins
 *  that dynamically change their latency using
 *  \ref AAX_IController::SetSignalLatency() "SetSignalLatency()" should always wait for
 *  an \ref AAX_eNotificationEvent_SignalLatencyChanged notification before updating its
 *  algorithm to incur this latency.
 *
 *  \sa \ref AAX_IController::SetSignalLatency() "SetSignalLatency()"
 *
 *  \param[out] outSamples
 *      The number of samples of signal delay published by the plug-in
 */
virtual 
AAX_Result  
GetSignalLatency(
    int32_t* outSamples) const = 0;

Feature: Automatic Delay Compensation Pro Tools 6.4 TDM introduces Automatic Delay Compensation for maintaining time-alignment between tracks that have plug-ins with differing DSP delays, tracks with different mixing paths, tracks that are split off and recombined within the mixer, and tracks with hardware inserts. To maintain time alignment, Pro Tools adds the exact amount of delay to each track necessary to make that particular track’s delay equal to the delay of the track that has the longest delay. This feature is only available in TDM versions of Pro Tools, and not in Pro Tools LE. Because of this new feature, it is now imperative that plug-in sample delays get reported correctly to DAE. Plug-ins should indicate the amount of delay by overriding the CProcess function GetDelaySamplesLong(). How many samples of delay to report differs depending on whether the plug-in is TDM, MuSh, or RTAS:

An RTAS plug-in should report the number of samples of delay caused by its algorithm. Most RTAS plug-ins simply need to report a zero-sample delay; RTAS plug-ins should only report delay samples if they perform their own sample buffering within RenderAudio(). For example, if an RTAS plug-in delays its output by the number of samples delivered to it within the RenderAudio() callback, it should report this number of samples as its delay. An RTAS plug-in should not report any samples of delay if it can process all of the samples delivered to it within the RenderAudio() function.

hoch commented 2 years ago

I am reopening this issue so we can discuss in the next teleconference.

@cvanwinkle You're welcome to join the call if you would like! See https://www.w3.org/community/audio-comgp/ for the detail.

hoch commented 4 months ago

Teleconference 5/30/2024:

The WG agrees on the idea of exposing this property under AudioNode interface:

audioNode.intrinsicLatency  // in seconds 
orottier commented 4 months ago

A few questions:

I could take a stab at writing the spec if that would be helpful?

padenot commented 4 months ago

we agreed that the DelayNode has zero intrinsic latency because there is no additional delay other than what the user actually requests. Do we agree that for the ConvolverNode we also don't count leading zeroes in the impulse response towards the intrinsic latency? In the same way we would not count leading zeroes in an AudioBufferSourceNode as delay.

That's right yes. However, if an implementer writes a convolver that has latency (e.g. because it needs to buffer because their fft code requires a power of two and the buffer size isn't a power of two, after merging the PR to be able to select it), the buffering would count towards latency.

Do the implementors want to commit a guarantee that the intrinsicLatency value is constant for the audio node lifetime, barring changes in the audio node settings? I think that would be very helpful for end users. E.g. the intrinsicLatency of the PannerNode is constant unless the panning model is changed.

This is a useful guarantee indeed. If we need to relax this for some reason, we'll simply add an event listener to tell the user, but it is unlikely that it would change on its own, it must always be paired with something else that changes (but see below for the AudioWorkletNode case).

What would be a proper mechanism for the AudioWorkletProcessor to report its latency? I'm thinking of an internal slot set to 0. by default, with a method reportInstrincLatency on the processor to update the value and propagate it to the AudioWorkletNode. This has the drawback that the value may not available in the control thread directly after construction. It could dispatch an event additionally.

From within the scope, set an attribute. It is then readable as normal. When set it is exposed on the AudioWorkletNode. The AudioWorkletNode fires an event that alerts user that they should re-fetch this property (e.g. if they had cached it somewhere, set a delay line somewhere else based on this value, etc.).

Do we want to signal the difference between "zero latency" and "latency has not been reported"?

undefined vs. 0, maybe? But that's mostly for AudioWorkletProcessor, in which case we can also spec that the default is 0.