markdascher commented 2 years ago

Describe the issue

Browser differences with baseLatency, currentTime, start, and stop prevent sounds from being scheduled precisely.

https://github.com/WebAudio/web-audio-api/issues/2397#issuecomment-257100626 suggests that start(currentTime + baseLatency) should reliably play a sound at the indicated time, but that's not true in all browsers. If it were, playing a click sound effect would be easy:

const osc = new OscillatorNode(ctx, { frequency: 1000 });
osc.connect(ctx.destination);
const t = ctx.currentTime + ctx.baseLatency;
osc.start(t);
osc.stop(t + 0.002);

This works in Chrome but not in Firefox. There are a few relevant bug reports (956574, 966585, 1228207, 1248731), but they've been open for nearly a decade, and it's unclear if they're even considered spec violations or just optional enhancements.

I consider this a spec bug because everyone seems to have a different idea of what these values are for, making it tougher to develop a reliable app. The spec describes what the fields are without clarifying what they're useful for. (Especially baseLatency.)

If we agree that start(currentTime + baseLatency) should be predictable under normal circumstances, then explicit language or examples could be added to the spec. This may also make issues like #2410 less visible. I considered adding a more focused comment under that issue, as the "more global scheduling latency value" mentioned in https://github.com/WebAudio/web-audio-api/issues/2410#issuecomment-846125762 sounds like it would help. But it wouldn't actually help without tightening up the existing fields first. And once that's done, baseLatency might already serve that purpose.

Where Is It

https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-currenttime https://webaudio.github.io/web-audio-api/#dom-audiocontext-baselatency https://webaudio.github.io/web-audio-api/#dom-audioscheduledsourcenode-start https://webaudio.github.io/web-audio-api/#dom-audioscheduledsourcenode-stop

Additional Information

baseLatency

Chrome baseLatency seems to match currentTime's increment interval.
Firefox baseLatency is intentionally always zero.

currentTime

Chrome currentTime is always accurate.
Firefox currentTime can be stale, updating only between every JS task.

start(currentTime + baseLatency)

Chrome schedules sound immediately, playing even if the JS task is still running.
Firefox defers scheduling until after draining the JS task and microtask queues.

stop(currentTime + baseLatency + 0.002)

Chrome plays the expected click sound immediately.
Firefox plays nothing, since currentTime + baseLatency + 0.002 is already in the past.

Adding a constant delay (at least 20ms) helps, but isn't reliable since the actual delay depends on how long the current task has taken, how much more time it will take, and even how many other tasks are queued. The only reliable fix is to prerender sound effects using an OfflineAudioContext, then use AudioBufferSourceNode (whose start method accepts a duration) instead of OscillatorNode.

Is there a downside to Chrome's behavior? Does the structure of Firefox (or other implementations) make it particularly difficult to standardize this behavior?

guest271314 commented 2 years ago

Note, on Chromium OscillatorNode starts rendering output at construction, without start() being called.

guest271314 commented 2 years ago

I would not rely on implementation timers for precision or consistency between implementations using the same code. You can connect(), disconnect() nodes using your own timing implementation.

padenot commented 2 years ago

#2397 (comment) suggests that start(currentTime + baseLatency) should reliably play a sound at the indicated time, but that's not true in all browsers. If it were, playing a click sound effect would be easy:

The linked message is not correct. baseLatency is useful to know if the Web Audio API implementation buffers internaly. outputLatency is useful to understand the latency induced by the operating system / hardware. Firefox doesn't buffer audio (ever), so baseLatency is zero. The graph processing is directly serviced from the real-time audio callback the OS calls. Summing the two numbers allows knowing the total latency (for example for syncing visuals).

I consider this a spec bug because everyone seems to have a different idea of what these values are for, making it tougher to develop a reliable app. The spec describes what the fields are without clarifying what they're useful for. (Especially baseLatency.)

The spec accurately says what should happen, not why nor how, it's not a user-manual. Sometimes we add non-normative notes, but not here.

In this particular case, it's clear what happens when start() is called, and when the messages are processed and when the clock needs to be update, in the section Processing model.

Firefox's behaviour was implemented a while back and hasn't been updated since this was specced. It lags behind the spec, so this is a Firefox bug, that I intend to fix.

Is there a downside to Chrome's behavior? Does the structure of Firefox (or other implementations) make it particularly difficult to standardize this behavior?

The downside to the way the spec says things should be done is that there is no guarantee that:

var ac = new AudioContext;
var osc1 = new OscillatorNode(ac);
var osc2 = new OscillatorNode(ac);
osc1.connect(ac.destination);
osc2.connect(ac.destination);
osc1.start();
osc2.start();

results in two oscillators starting in phase. It's well possible that the two start calls happen in different render quantums (same if/when passing an explicit start time, see the column Description of the spec).

guest271314 commented 2 years ago

To demonstrate Chrome's implementation of OscillatorNode starts immediately, without start() being called at all, run this in console

const ac = new AudioContext();
const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
osc.connect(msd);
processor.readable.pipeTo(new WritableStream({
  write(value) {console.log(value)}
}));

padenot commented 2 years ago

No, this just shows that a MediaStreamAudioDestinationNode outputs silence continuously, like an AudioDestinationNode, as it should.

const ac = new AudioContext();
const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
osc.connect(msd);
processor.readable.pipeTo(new WritableStream({
  write(value) {
    var f = new Float32Array(1024);
    value.copyTo(f, {planeIndex:0});
    console.log(f);
}}));

continously logs silent buffers.

markdascher commented 2 years ago

Thanks for clarifying! So if I'm understanding correctly:

The spec already states that the start and stop in my example should take effect even if JS code is still running. Firefox will eventually match Chrome in this aspect.
baseLatency + outputLatency is used when syncing audio to video. baseLatency + currentTime doesn't represent anything useful. Which explains why one of the ideas mentioned here is to add another new latency value to serve this purpose.
- Personally, I can read the spec's definitions of baseLatency and outputLatency backwards and forwards and still not be entirely sure whether to use just outputLatency or baseLatency + outputLatency when syncing audio to video. I still think it would be helpful to spell that out explicitly. But that could just be me. And I suppose baseLatency + outputLatency isn't necessarily the whole story either, given that latency can be introduced within individual nodes as well.
Firefox currently waits for JS to finish before processing control messages in one big batch, which guarantees that "immediately" means the same thing for the entire batch. Changing to spec-compliant behavior will lose that property.
The comments on old bugzilla tickets refer to older versions of the spec, which didn't have the same requirements.

So that only leaves currentTime. The definition seems to describe Chrome's behavior of an always up-to-date value. So it sounds like we're all set then. Nothing to see here. 🙂

If the solution to #2410 ends up being another latency value that gets added to currentTime, I suppose that would help keep multiple oscillators in phase as well. Will be interesting to see how that plays out.

guest271314 commented 2 years ago

@padenot

No, this just shows that a MediaStreamAudioDestinationNode outputs silence continuously, like an AudioDestinationNode, as it should.

No, it does not. It shows that Chrome implementation is exactly backwards re OsciallatorNode that starts without start() being called and implementation of MediaStreamTrack of kind "audio" does not produce silence per specification, Issue 1262796: MediaStreamTrack does not render silence https://bugs.chromium.org/p/chromium/issues/detail?id=1262796.

Simply comment out OsciallatorNode to demonstrate no output, specifically no silence produced by MediaStreamAudioDestinationNode on Chrome.

const ac = new AudioContext();
//const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
//osc.connect(msd);
processor.readable.pipeTo(new WritableStream({

    write(value) {console.log(value)}
}));

Chrome banned me, but didn't fix their bugs.

guest271314 commented 2 years ago

@padenot

Given OscillatorNode starts rendering output on Chrome without start() being called it is impossible to rely on baseLatency or any other timing mechanism for precision. start() is useless on Chrome when piping the output through a MediaStreamAudioDestinationNode. I encountered that issue by experimenting with WebCodecs when AudioFrame was initially specified and shipped on Chrome and getting or creating a timestamp was (is) not clear, given Chrome's internal restrictions on sample rate for 'opus' encoding/decoding. I found that I could "piggy-back" on an OscillatorNode producing silence connected to an MediaStreamAudioDestinationNode to get timestamp for input to MediaStreamTrackGenerator.

MediaStreamAudioDestinationNode alone does not produce silence. OscillatorNode produces output immediately, I don't need to call start() at all. Again, exactly backwards implementation. They marked the bug as WontFix. If you want consistency and conformance with the specifications, you will need to file your own Chrome bugs.

Note also, Web Audio API is not the only specification that suffers from Chrome implementation of MediaStreamTrack of kind "audio" not rendering silence, WebRTC is also affected.

On Firefox 95 both <audio> elements play, on Chromium 99 neither <audio> element plays https://plnkr.co/edit/XNwNwANBuMzaBKxj?preview.

<!DOCTYPE html>

<html>
  <head>
    <title>MediaStreamTrack does not render silence on Chromium</title>
    <!-- https://bugs.chromium.org/p/chromium/issues/detail?id=1262796 -->
    <!-- https://www.w3.org/TR/mediacapture-streams/#life-cycle-and-media-flow -->
  </head>

  <body>
    <script>
      var webrtc = new RTCPeerConnection();
      var transceiver = webrtc.addTransceiver('audio');
      var { track: webrtc_track } = transceiver.receiver;
      var webrtc_audio_element = new Audio();
      webrtc_audio_element.controls = webrtc_audio_element.autoplay = true;
      document.body.appendChild(webrtc_audio_element);
      webrtc_audio_element.srcObject = new MediaStream([webrtc_track]);
      webrtc_audio_element.ontimeupdate = webrtc_audio_element.onplaying = (
        e
      ) =>
        console.assert(e.target.currentTime > 0, [
          e.target.currentTime,
          e.type,
        ]);

      var ac = new AudioContext();
      var msd = new MediaStreamAudioDestinationNode(ac);
      var { stream } = msd;
      var [webaudio_track] = stream.getAudioTracks();
      var webaudio_element = new Audio();
      webaudio_element.controls = webaudio_element.autoplay = true;
      document.body.appendChild(webaudio_element);
      webaudio_element.srcObject = new MediaStream([webaudio_track]);
      webaudio_element.ontimeupdate = webaudio_element.onplaying = (e) =>
        console.assert(e.target.currentTime > 0, [
          e.target.currentTime,
          e.type,
        ]);
    </script>
  </body>
</html>

hoch commented 2 years ago

Closing per https://github.com/WebAudio/web-audio-api/issues/2467#issuecomment-1009012693.

WebAudio / web-audio-api

start(currentTime + baseLatency) behavior isn't strictly defined #2467

Describe the issue

Where Is It

Additional Information