WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 166 forks source link

Need a way to determine AudioContext time of currently audible signal #12

Closed olivierthereaux closed 8 years ago

olivierthereaux commented 11 years ago

Originally reported on W3C Bugzilla ISSUE-20698 Thu, 17 Jan 2013 14:15:09 GMT Reported by Joe Berkovitz / NF Assigned to

Use case:

If one needs to display a visual cursor in relationship to some onscreen representation of an audio timeline (e.g. a cursor on top of music notation or DAW clips) then knowing the real time coordinates for what is coming out of the speakers is essential.

However on any given implementation an AudioContext's currentTime may report a time that is somewhat ahead of the time of the actual audio signal emerging from the device, by a fixed amount. If a sound is scheduled (even very far in advance) to be played at time T, the sound will actually be played when AudioContext.currentTime = T + L where L is a fixed number.

On Jan 16, 2013, at 2:05 PM cwilso@google.com wrote:

It's problematic to incorporate scheduling other real-time events (even knowing precisely "what time it is" from the drawing function) without a better understanding of the latency.

The idea we reached (I think Chris proposed it, but I can't honestly remember) was to have a performance.now()-reference clock time on AudioContext that would tell you when the AudioContext.currentTime was taken (or when that time will occur, if it's in the future); that would allow you to synchronize the two clocks. The more I've thought about it, the more I quite like this approach - having something like AudioContext.currentSystemTime in window.performance.now()-reference.

On Jan 16, 2013, at 3:18 PM, Chris Rogers crogers@google.com wrote:

the general idea is that the underlying different platforms/OSs can have very different latency characteristics, so I think you're looking for a way to query the system to know what it is. I think that something like AudioContext.presentationLatency is what we're looking for. Presentation latency is the time difference between when you tell an event to happen and the actual time when you hear it. So, for example, with source.start(0), you would hope to hear the sound right now, but in reality will hear it with some (hopefully) small delay. One example where this could be useful is if you're trying to synchronize a visual "playhead" to the actual audio being scheduled...

I believe the goal for any implementation should be to achieve as low a latency as possible, one which is on-par with desktop/native audio software on the same OS/hardware that the browser is run on. That said, as with other aspects of the web platform (page rendering speed, cache behavior, etc.) performance is something which is tuned (and hopefully improved) over time for each browser implementation and OS.

olivierthereaux commented 11 years ago

Original comment by Olivier Thereaux on W3C Bugzilla. Tue, 02 Apr 2013 12:40:38 GMT

Note (Per discussion at Audio WG f2f 2013-03-26):

We need to differentiate the latency-discovery issue (already filed) from follow-on questions of audio clock drift and granularity which may not affect user experience to same degree

olivierthereaux commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 02 Apr 2013 14:30:39 GMT

Can we clearly delineate? I'm not positive I understand what "latency discovery" is, because there's one bit of information (the average processing block size) that might be interesting, but I intended this issue to cover the explicit "I need to synchronize between the audio time clock and the performance clock at a reasonably high precision - that is, for example:

1) I want to be playing a looped sequence through Web Audio; when I get a timestamped MIDI message (or keypress, for that matter), I want to be able to record it and play that sequence back at the right time.

2) I want to be able to play back a sequence of combined MIDI messages and Web Audio, and have them synchronized to a sub-latency level (given the latency today on Linux and even Windows, this is a requirement). Even if my latency of Web Audio playback is 20ms, I should be able to pre-schedule MIDI and audio events to occur within a millisecond or so of each other.

Now, there's a level of planning for which knowing the "average latency" - related to processing block size, I imagine - would be interesting (I could use that to pick a latency in my scheduler, for example); but that's not the same thing. Perhaps these should be solved together, but I don't want the former to be dropped in favor of the latter.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 02 Apr 2013 16:10:16 GMT

This bug is intended to cover both the MIDI-synchronization cases that you proposed and also the original case I raised, which involved the placement of a visual cursor that is synchronized with audio that's being heard at the same time.

In the original visual case, the main thread needs to be able to determine the original "audio scheduling" time (in the context time frame used by start(), setValueAtTime(), etc.) for the audio signal presently emerging from the speaker. AudioContext.currentTime does not supply this time, as I explained in my original bug description.

I am not interested in the average latency or processing block size and agree that would be a different bug.

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 02 Apr 2013 18:40:22 GMT

I believe we're talking about two sources of latency here, one is the clock drift between what we measure on the main thread through AudioContext.currentTime and the actual clock on the audio thread, and the other latency is between the "play" call from the audio thread to the point where the OS actually starts to hand off the buffer to the sound card (and another one of potentially a delay until your speakers start to play out what was received on the sound card.) With all of that, if the implementation also uses system level APIs which do not provide enough resolution (as is the case on Windows XP, for example), there is another artificial latency that is introduced in the calculations because of the unability to measure time precisely enough.

The use case of syncing the display of something on the screen with sound coming out of speakers is very hard to satisfy, since browsers generally do not provide any guarantee on when the updates resulting from a change in the DOM tree or a Web API call will be reflected on the screen. On an implementation which strives to provide a 60fps rendering, this delay can be as high as 16ms in the best case, and much more than that if the implementation is suffering from frame misses. So, no matter what API we provide here, there will always be a delay involved in getting stuff on the screen.

For the MIDI use case, I imagine knowing the latest measured drift from the audio thread clock and what AudioContext.currentTime returns should be enough, right?

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 02 Apr 2013 19:35:17 GMT

Ehsan, let me clarify the needs here with respect to the latency between the context's currentTime and the signal coming out of the sound card.

High accuracy for this use case is not needed. It's OK for screen updates to be slightly delayed for the purposes of seeing a cursor or pointer whose position over some sort of waveform or notated music reflects what one is hearing. These visual delays will not become really bothersome until they are consistently over 75ms or so. And typically the DOM-to-screen display delay is much, much lower (more like the 16ms number you gave).

On the other hand these delays can be dwarfed by the currentTime-to-sound-card latency on some platforms, which can be as high as 200 or 300 ms. Having the cursor be misplaced by that amount is an experience-killer. That's why it's so important for an application to be able to acquire this number from the API: it's potentially much larger.

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 02 Apr 2013 19:47:19 GMT

(In reply to comment #5)

Ehsan, let me clarify the needs here with respect to the latency between the context's currentTime and the signal coming out of the sound card.

High accuracy for this use case is not needed. It's OK for screen updates to be slightly delayed for the purposes of seeing a cursor or pointer whose position over some sort of waveform or notated music reflects what one is hearing. These visual delays will not become really bothersome until they are consistently over 75ms or so. And typically the DOM-to-screen display delay is much, much lower (more like the 16ms number you gave).

Right.

On the other hand these delays can be dwarfed by the currentTime-to-sound-card latency on some platforms, which can be as high as 200 or 300 ms. Having the cursor be misplaced by that amount is an experience-killer. That's why it's so important for an application to be able to acquire this number from the API: it's potentially much larger.

Yeah, I totally agree. But I'm not sure how that related to exposing the potential huge latency to web content. Ideally an implementation should minimize the latency as much as it can, and bring it well under the ranges that can be perceived by humans. Once such a latency is achieved, do you agree that exposing the latency information to web content would not be useful any more?

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 02 Apr 2013 19:56:30 GMT

Based on my knowledge of various audio platforms I don't know if it is likely that implementations can always succeed in getting the latency down to the point where it doesn't matter.

I agree that in principle if it was always quite small, it wouldn't matter much, but I am concerned that this is not a realistic goal to sign up for.

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 02 Apr 2013 20:05:02 GMT

(In reply to comment #7)

Based on my knowledge of various audio platforms I don't know if it is likely that implementations can always succeed in getting the latency down to the point where it doesn't matter.

I agree that in principle if it was always quite small, it wouldn't matter much, but I am concerned that this is not a realistic goal to sign up for.

Fair enough, but I think we should have examples on cases where this latency is unavoidably above the human perception range and the implementation can do nothing about it. I believe that if implementation avoid using imprecise OS clock facilities, there should not be a case where this can happen.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 02 Apr 2013 20:50:58 GMT

As one example, Android audio latency is high, and in the perceptible range that I have described. This is not due to imprecise clocks -- my understanding is that it is fixed delay inside the OS that is a consequence of internal handoffs of sample frame buffers. Far from being imprecise, this delay is rock-solid consistent (if it varied, there would be output glitches).

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 02 Apr 2013 21:19:16 GMT

OK, that is a good example, but that is an example of the second class of latencies I gave in comment 4. Not sure how much can be done in order to report those latencies.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 02 Apr 2013 21:28:14 GMT

The only apparent alternative is to do what our app has to do now, namely check the user-agent string to see if the OS is Android, and impose a fixed hardcoded time correction to AudioContext.currentTime for the purposes of understanding what the user is currently hearing.

I think the idea of the application being able to know what is currently playing is pretty fundamental. But I don't want to belabor the point, knowing that so many other fundamentals also need to be implemented -- I just want to clarify why this matters.

olivierthereaux commented 11 years ago

Original comment by Wei James on W3C Bugzilla. Tue, 09 Apr 2013 01:08:52 GMT

I think it is better to split this issue to three seperated issues:

1) latency issue 2) time drift issue (currentTime and currentSystemTime) 3) granularity issue.

although all of these three issues are time releated, but they are much different.

it will bring some confusion if we combined these issues together.

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 09 Apr 2013 01:41:39 GMT

Yeah that would probably make sense.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 09 Apr 2013 07:35:58 GMT

I would like to preserve this bug as capturing the latency issue, since that is why I originally filed it.

The concerns about granularity or clock drift do not seem serious enough to me for me to be effective at capturing those concerns in additional bugs.

olivierthereaux commented 11 years ago

Original comment by Wei James on W3C Bugzilla. Wed, 10 Apr 2013 01:42:50 GMT

(In reply to comment #14)

I would like to preserve this bug as capturing the latency issue, since that is why I originally filed it.

The concerns about granularity or clock drift do not seem serious enough to me for me to be effective at capturing those concerns in additional bugs.

I understand. as time drift is critical for some use scenaria we care, I would like to file another bug to track it if no objection.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Thu, 11 Apr 2013 06:05:35 GMT

Yes, please file another bug, no objections at all!

olivierthereaux commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Thu, 11 Apr 2013 18:10:46 GMT

I'm thoroughly confused, as this bug is (based on its title) currently targeting the currentTime/currentSystemTime area.

Joe: latency of Android, etc may be quite consistent, but are you just concerned about average latency, or are you trying to synchronize live audio with something in the system time space? I thought it was the latter.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 16 Apr 2013 15:39:09 GMT

I've retitled the bug to try to more effectively communicate the nature of the issue. Yes, it is about average latency: the difference between AudioContext.currentTime and the original as-scheduled playback time for the signal that is currently being emitted from the audio hardware. Please see the very first comment in the bug for a description of the use case that I am trying to address.

If someone else wants to file a bug about how to correlate AudioContext time with other timebases in the browser I'm fine with that, but that isn't the problem that I'm concerned about.

olivierthereaux commented 11 years ago

Original comment by Ehsan Akhgari [:ehsan] on W3C Bugzilla. Tue, 16 Apr 2013 15:42:13 GMT

Would a simple readonly constant which gives the UA's best approximation of the average latency on the given platform/hardware suffice for your needs?

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 16 Apr 2013 15:47:39 GMT

Yes, Ehsan, that is exactly what I am asking for.

olivierthereaux commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 16 Apr 2013 16:23:11 GMT

Gah. That's not what the current title asks for - average latency is a fine thing to want to know, but it doesn't address the precise synchronization need I'd mentioned in the email at the top of this message that would let authors synchronize MIDI and audio, or on-screen and audio. I think the errors would quite possibly be audible, for MIDI (because you can hear a < 16ms error, even if you can't see it), depending on how frequently JS code could be called with the same currentTime (related to block size? I'm not sure, given what Ehsan said about their processing mechanism, that you wouldn't be able to see visual sync errors with only average latency, if the block processing is >16.7ms on a slow system.).

I'd suggest a title of "Need to expose average latency of system", and then I'll go file the "Need to expose time stamp of currentTime" issue that is necessary for synchronization with MIDI. I'd actually rather have this bug represent that issue, given the long background thread, but I can link them.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Tue, 16 Apr 2013 16:50:03 GMT

Please feel free to change the title to something that will make you comfortable. :) I have nothing invested in the title, I'm just trying to do my best to interpret your feedback.

olivierthereaux commented 11 years ago

Original comment by Joe Berkovitz / NF on W3C Bugzilla. Thu, 25 Apr 2013 17:18:30 GMT

Capturing discussion from the WG conference call 4/25/2013:

An additional note that was not discussed:

olivierthereaux commented 11 years ago

Original comment by Pierre Bossart on W3C Bugzilla. Fri, 26 Apr 2013 20:11:57 GMT

I would like to suggest a different approach, which would solve both the latency and drift issues by adding 4 methods:

triggerTime() // TSC when audio transfers started, in ns currentSystemTime() // current system time (TSC), in ns currentRendererTime() // time reported by audio hardware (in ns), reset to zero when transfer starts currentTime() // audio written or read to/from audio stack (in ns)-> same as today

With these 4 methods, an application can find the latency by looking at currentTime()-currentRendererTime(). If a specific implementation doesn't actually query the hardware time, then it can implement a fixed os/platform offset.

Now if you want to synchronize audio with another event, you have to monitor the audio/system time drift, which can be done by looking at (currentSystemTime()-triggerTime())/currentRendererTime()

bjornm commented 10 years ago

Sorry if I'm joining the conversation a bit late :) I'm currently struggling with mapping midi messages to audio context for tight and accurate playback and recording. If I may add some comments on the proposed solutions:

Comment #22 from the WG conference call 4/25/2013 sounds good - I'm assuming the AC converter methods take an argument, so raciness is avoided. Example mapping midi message time to audio time:

var audioTime = audioContext.performanceTimeToAudioTime(midiMessageEvent.receivedTime);

Regarding comment #23 by Pierre, I can see a couple of potential problems. Suppose I do:

var driftRatio = (ctx.currentSystemTime - ctx.triggerTime) / ctx.currentRendererTime; var audioTime = (midiMessageEvent.receivedTime - ctx.triggerTime) / driftRatio;

  1. The calculation of driftRatio above is subject to raciness (gc pauses, rendering time quantum steps). This error will decrease over time, but is it negligible?
  2. I'm assuming currentRendererTime is continuous - unlike currentTime. If not, the driftRatio will be slightly incorrect.
  3. Getting renderer latency using (ctx.currentTime - ctx.currentRendererTime) is also subject to raciness. Additionally, if currentRendererTime advances not by block but continuously, there will always be an error due to the rendering time quantum advancement.

Finally, the assumption in the #22 corollary that currentTime advances in steps seems correct, at least in blink's AudioDestinationNode.cpp.

joeberkovitz commented 10 years ago

Related issue #340 documents the issue of DOM time / AudioContext time mapping. This issue solely concerns the determination of physical sound output times to AudioContext scheduling times. It may be possible to resolve this issue as a corollary of the mapping issue but for now separate bugs seem appropriate.

srikumarks commented 10 years ago

I'd like to revisit this in the context of the new audio worker node proposal in #113 (proposal comment) since these two issues, imho, are the only major gaps remaining before the spec becomes v1.0 worthy.

The latency issue is complicated by changes in audio routing while an audio context is running. This is a very real problem - ex: I'm running a visual metronome on my laptop and then connect to AirPlay to get a bigger sound - currently, there is a sudden 0.3 sec lag between the visual and the sound. (The lag exists even if I start with AirPlay on.) We cannot work around this in the current scenario. Synchronizing MIDI in such cases is even worse since it would need greater precision than a visual needs.

I'll try to collect in this post all the information that would be needed to solve this problem.

  1. Expose currentTime and currentSystemTime - which is a DOMHiResTimeStamp akin to requestAnimationFrame's time stamp - on the AudioContext. Both should refer to the time at which the next computed audio sample will be sent to the audio subsystem (note: not play out of the speakers).
  2. Expose a currentLatency value (in secs) on the RealtimeAudioDestinationNode, so that playback time (time at which the sample hits the speakers) can be computed as currentTime + currentLatency. This is not relevant to OfflineAudioDestinationNode. The API ought to split those two destination node types.
  3. Add an event (to RealtimeAudioDestinationNode) that will notify when the latency has changed, such as what happens when the audio route changes. For systems where the audio route change's impact on latency cannot be known, an app will not receive this callback and would have to poll currentLatency .. which can be a a fixed number for the worst kinds of systems ;). If the destination is a MediaStreamAudioDestinationNode, then such an event would be irrelevant to that case.

Using these, I believe we'll have enough information to follow any clock drift over time. Furthermore, implementations can smooth out the time-local mapping between currentTime and currentSystemTime to accommodate clock drift so that locally, the property delta(currentTime) = delta(currentSystemTime) / 1000 is most of the time satisfied to a reasonably high precision like 1 sample error in 10 seconds (barring system clock jumps).

joeberkovitz commented 9 years ago

TPAC: Note that currentTime can increase in a jumpy way due to variable processing chunk size.

We'll add two things: new AudioContext attribute exposing UA's best guess at real context time being heard now on output device (this will normally be a bit behind currentTime, and not quantized). Also new attribute expressing DOM timestamp corresponding to currentTime.

joeberkovitz commented 9 years ago

Correction: We only need a new AudioContext readonly attribute currentPlaybackTime that returns the DOMHiResTimestamp at which the start of the next rendering quantum will actually be heard by the user. This will not advance in real time (since it is a function of currentTime, which doesn't advance uniformly either): both are render-quantized. Developers can consult both this time and the current DOM performance time to determine the immediate relationship between the two timebases for audio and DOM.

It's understood that it's possible this could be inaccurate due to delays not observable from the OS, e.g. Bluetooth transmission delays.

srikumarks commented 9 years ago

Below, I've tried to express my expectation of the characteristics of such a currentPlaybackTime.

readonly attribute double currentPlaybackTime

This is the time at which the audio sample scheduled at currentTime will exit the audio hardware represented by the context's destination. The value is in the time coordinate system of DOMHiResTimeStamp, in units of milliseconds. This time permits scheduled audio events to be tightly coordinated with visual, MIDI and other timing critical activities that may constitute an application.

The latency imposed by the current audio route at any moment is calculated as context.currentPlaybackTime - performance.now(). Since some latency is always expected to be present, however small, the inequality context.currentPlaybackTime ≥ performance.now() MUST always be satisfied.

When the audio hardware route associated with the context's destination does not change, events scheduled at context.currentTime + t are expected to exit the audio hardware when performance.now() is approximately context.currentPlaybackTime + t * 1000 for t ≥ 0. It is possible that the audio output clock is different from the system clock, in which case the two may drift apart over a long enough period. The value reported by context.currentPlaybackTime MUST continuously compensate for such a drift if it exists, so that the scheduling correspondence with currentTime holds for sufficiently large values of t. (TODO: What is "sufficiently large" and how accurately should the relation hold?)

Note that unlike currentTime, this time is not guaranteed to monotonically increase. The underlying hardware route associated with a context may change over time, in which case this play time may jump forwards or backwards depending on the route's latency. When the latency increases, for example when connecting Bluetooth speakers, more frames will end up being computed before the already computed frames exit the new audio route. The context.currentPlaybackTime will therefore jump forward to reflect the latency change. When latency decreases, such as when disconnecting Bluetooth speakers and connecting a pair of headphones, already computed frames will likely be dropped by the audio subsystem, and context.currentPlaybackTime will jump backward.

srikumarks commented 9 years ago

It's understood that it's possible this could be inaccurate due to delays not observable from the OS, e.g. Bluetooth transmission delays.

These are observable from some OSes. AVAudioSession on iOS provides this latency in its outputLatency property. The AudioTimeStamp received as an argument in the output audio callbacks provides both sample time and host time (in addition to other times if available), but the latency needs to be added to the host time to get the playback time.

pozdnyakov commented 9 years ago

I have looked through WebAudio implementation in Blink and I'm afraid that "context.currentPlaybackTime" with the described features will be hard to implement. You never know the real latency beforehand: that may depend on cpu load, complexity of the involved AudioNodes structure and so on.

Would it suffice if the context provided the latency of the latest played rendering quantum?

An app could use this value (or average from the several obtained values) in order to estimate playback time of next rendering quantum (performance.now() + estimated latency).

joeberkovitz commented 9 years ago

If it is a rough or heavily time-averaged estimate that is fine from my POV, speaking as a developer. The problem right now is that one has to hack in hard-coded constants as estimated latencies for different UAs and OSs. An estimated overall "recent" latency would be far, far better than this.

padenot commented 9 years ago

Audio output latency should not change when the audio callback is not overloaded. Also it should be independent of the topology of the graph.

Of course, you're always going to have the latency a little in the past because of cross-thread communication, but it should be a useful information regardless.

pozdnyakov commented 9 years ago

Audio output latency should not change when the audio callback is not overloaded. Also it should be independent of the topology of the graph.

Sorry for being unclear, I meant cummulative latency (as described at http://webaudio.github.io/web-audio-api/#latency ). Do you think notion just of audio output latency is sufficient? Cannot latency caused by other factors affect user's experience?

padenot commented 9 years ago

If you're adding this to the AudioContext (and in fact I think it should be the AudioDestinatinoNode), then yes, it should be only the audio output latency, i.e., the latency from the moment the implementation hands off a buffer to the system and the moment the sound comes out of the speakers.

There has been thoughts about having a "latency" member on AudioNodes, though, but that's a separate discussion.

joeberkovitz commented 9 years ago

I think the description of latency in the spec is currently a bit vague and seems to leave room for other factors, like the latency for some input gesture by the user to be received by the UA. Let's exclude that. We are specifically talking about audio output latency, as @padenot said.

pozdnyakov commented 9 years ago

@padenot , @joeberkovitz thanks for clarification. I would agree that a "latency" property representing the audio output latency is better to be a AudioDestinatinoNode interface member so that it is more clear what exactly this property contains (since AudioDestinatinoNode can often be considered as an audio output device which is connected to speakers)

joeberkovitz commented 9 years ago

SGTM

pozdnyakov commented 8 years ago

Having re-read the comments in this issue I see a little problem with AudioDestinatinoNode.latency solution: it does not actually provide mapping between AudioContext.currentTime and DOM-related time bases (which was expected in https://github.com/WebAudio/web-audio-api/issues/340#issuecomment-107713714).

In order to solve it I would propose to add context.performanceTimeStamp property as following: 1) context.performanceTimeStamp contains values in same units and starting from the same origin as performance.now()

2) context.performanceTimeStamp contains the timestamp taken when context.currentTime was last time updated. You can imagine it as if we call performance.now() each time the new quantum is sent to audio subsystem (and amount of sent quantums (converted to sec) is what context.currentTime actually represents).

Besides the clock mapping issue, this property would allow to obtain a slightly more accurate device position:

The actual device position = context.currentTime + (performance.now() - context.performanceTimeStamp) - destNode.latency()

If people support this proposal I could start prototyping it in chromium

joeberkovitz commented 8 years ago

I think this makes sense. The performance time corresponding to currentTime would then be context.performanceTimeStamp + destNode.latency(), correct?

pozdnyakov commented 8 years ago

I think context.performanceTimeStamp + destNode.latency() will give a performance timestamp estimating when the context.currentTime position is actually played from speaker.

srikumarks commented 8 years ago

With regard to the original problem, the important thing is to be able to answer the question "if I start() a source node directly wired to the destination node of a context at a time current time + t, at what system time can I expect the first source sample to hit the speakers represented by the destination node of the context".

We do not need the latency for the sake of knowing the latency. So introducing two concepts (performance time and latency) just so we can answer the above question does not seem simpler than just answering that question directly (which is only one concept to be specd). This, especially since the latency can be calculated from the answer to this question whenever we need it.

Defining such a performance time in relation to the mechanism that updates currentTime seems like unnecessarily committing to an implementation.

Furthermore, since the audio context's destination node cannot be modified after creation, it seems alright to have the context track this property of the destination node. I think that is simpler to explain to non-audio folks.

joeberkovitz commented 8 years ago

I think there do have to be two concepts supported here, if we are trying to both 1) map between context time and performance time and 2) account for latency. In any given use case a developer might want to know...

We will need at least one number to represent the context time/performance time mapping, and another number to represent the rendering time/playback time mapping. There's no way to avoid two numbers here that I can see, if we want to serve all the use cases. But we can choose which two numbers they are.

@pozdnyakov's proposal chooses to provide 1) a conversion of current context time to performance time, and 2) a latency. There are some nice things about this choice. Even though 1) is expressed in terms of the update mechanism, context.performanceTimeStamp really has a simple meaning: "the DOM performance time that corresponds to currentTime". It runs forward monotonically and keeps in step with currentTime. And 2), the latency, will tend to stay the same. It is actually useful to know the latency alone, as it affects various other buffering and pre-roll strategies that the app might have to undertake.

You are asking for the performance time at which the audio at currentTime will be actually heard (i.e. @pozdnyakov's first number plus the latency). This is definitely a useful number, and possibly the most useful one for most applications -- you can use it to, say, schedule an animation to occur in synchrony with a sound. But to serve the other use cases, we'll still need to either know the latency, or get hold of another number. That other number could be either the context time at which the next block of audio will be actually heard (which is useful too) or the performance time at which the next block of audio will be rendered (which is just performanceTimeStamp again).

Personally I think @pozdnyakov's proposal is cleanest: it lets us have a single current rendering time that advances in step, either in context units or in DOM units, and it gives us the latency as a separable component that will tend to remain stable. I'm curious what others think.

About the destination node, I think it's advisable to put it there because we do actually allow multiple destination nodes in a graph via MediaStreamDestinationNode, and I suppose it's possible that multiple device-oriented destinations in one graph could exist in the future.

padenot commented 8 years ago

Also please keep in mind that there is often clock skews with audio systems (and output streams in particular), so we might want to have some sort of mapping function that takes the slope of the drift into account.

pozdnyakov commented 8 years ago

Also please keep in mind that there is often clock skews with audio systems (and output streams in particular), so we might want to have some sort of mapping function that takes the slope of the drift into account.

I would leave it for the developer: s/he could estimate the skews watching how both audio and performance timestamps are changed during the playback.

padenot commented 8 years ago

This is often too slow to be practical. IRCAM people are doing proof of concept of this and see convergence number in the order of a minute or so on high-latency devices.

@jipodine you've done some work in the area, care to elaborate ?

jipodine commented 8 years ago

I tried to synchronise the audio output of several browsers, and I the main caveat at the moment is the low precision of the audio context currentTime, specially for the devices with a big audio block. (Of course, the added latency before the sound is actually audible is an important problem too, but exposing the value, with a callback when it changes will solve it.) This is why a mapping to performance.now is important.

The uncertainty of the currentTime introduces a jitter, for any event that needs to be scheduled to a relative time, like performance.now. The uncertainty of the currentTime also implies time needed for the accurate estimation of performance.now according to currentTime. This is obvious for estimating the relative skew:

linreg

And to get an estimation of the time in between the currentTime steps, we also need enough samples.

When trying to synchronise the audio rendering across browsers, I observed that the required time-span of observation could adapt (to quickly start, to compensate for large audio block, etc.). Locally, within seconds, the use of the estimated time is more accurate than the jitter introduced by currentTime. (@padenot over a network, directly synchronising the browsers currentTime requires 1 or 2 minutes, mainly because of the low precision of currentTime, but also because of network transmission, and JavaScript user code.)

And, as any clock varies over time, it is important that the synchronisation process keeps running.

A browser could run the audio as soon as started, and keep an estimation of its time, shared between any audio process. There is an energy drawback, as the audio needs to run continuously. The estimation could also start on the first audio requirement, with a less accurate estimation. Any way, it would be more accurate than a user code, than run within the JavaScript main loop, with less timing accuracy and less access to the OS facilities.

About the use-cases, the 4 categories of https://github.com/WebAudio/web-audio-api/issues/12#issuecomment-148946233 seem to cover them all. (Low-latency vs synchronisation, block processing vs continuous timing)

pozdnyakov commented 8 years ago

@jipodine thanks for the detailed description. Do you think the proposal from https://github.com/WebAudio/web-audio-api/issues/12#issuecomment-148426058 would simplify enough the currentTime mapping to performance.now ?

srikumarks commented 8 years ago

@joeberkovitz - in an earlier comment, you mentioned the adequacy of a "currentPlaybackTime" field on the audio context. I'm unclear what the requirement for a "performance time being rendered" actually means. What time does this actually correspond to? If it is referring to the sample that has been computed by the engine and is about to enter the audio route, is it necessary to split the latency into two parts - the context's block size + the route imposed latency? What would be a use case that could make use of such a "performance time being rendered"?

Now that AudioWorkers are going to step in and will be running in sync with the native nodes, there is no longer a need for the "playbackTime" field in the audio event. It is just the currentTime anyway and now we don't expect currentTime to change on the audio thread while a script is running. I think this makes such a "performance time being rendered" redundant.

If we did have a "currentPlaybackTime" as expressed in the recent summary of my understanding, what use cases would not be covered by it?