Closed markdascher closed 2 years ago
Note, on Chromium OscillatorNode
starts rendering output at construction, without start()
being called.
I would not rely on implementation timers for precision or consistency between implementations using the same code. You can connect()
, disconnect()
nodes using your own timing implementation.
#2397 (comment) suggests that
start(currentTime + baseLatency)
should reliably play a sound at the indicated time, but that's not true in all browsers. If it were, playing a click sound effect would be easy:
The linked message is not correct. baseLatency
is useful to know if the Web Audio API implementation buffers internaly. outputLatency
is useful to understand the latency induced by the operating system / hardware. Firefox doesn't buffer audio (ever), so baseLatency
is zero. The graph processing is directly serviced from the real-time audio callback the OS calls. Summing the two numbers allows knowing the total latency (for example for syncing visuals).
I consider this a spec bug because everyone seems to have a different idea of what these values are for, making it tougher to develop a reliable app. The spec describes what the fields are without clarifying what they're useful for. (Especially baseLatency.)
The spec accurately says what should happen, not why nor how, it's not a user-manual. Sometimes we add non-normative notes, but not here.
In this particular case, it's clear what happens when start()
is called, and when the messages are processed and when the clock needs to be update, in the section Processing model.
Firefox's behaviour was implemented a while back and hasn't been updated since this was specced. It lags behind the spec, so this is a Firefox bug, that I intend to fix.
Is there a downside to Chrome's behavior? Does the structure of Firefox (or other implementations) make it particularly difficult to standardize this behavior?
The downside to the way the spec says things should be done is that there is no guarantee that:
var ac = new AudioContext;
var osc1 = new OscillatorNode(ac);
var osc2 = new OscillatorNode(ac);
osc1.connect(ac.destination);
osc2.connect(ac.destination);
osc1.start();
osc2.start();
results in two oscillators starting in phase. It's well possible that the two start calls happen in different render quantums (same if/when passing an explicit start time, see the column Description of the spec).
To demonstrate Chrome's implementation of OscillatorNode
starts immediately, without start()
being called at all, run this in console
const ac = new AudioContext();
const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
osc.connect(msd);
processor.readable.pipeTo(new WritableStream({
write(value) {console.log(value)}
}));
No, this just shows that a MediaStreamAudioDestinationNode
outputs silence continuously, like an AudioDestinationNode
, as it should.
const ac = new AudioContext();
const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
osc.connect(msd);
processor.readable.pipeTo(new WritableStream({
write(value) {
var f = new Float32Array(1024);
value.copyTo(f, {planeIndex:0});
console.log(f);
}}));
continously logs silent buffers.
Thanks for clarifying! So if I'm understanding correctly:
start
and stop
in my example should take effect even if JS code is still running. Firefox will eventually match Chrome in this aspect.baseLatency + outputLatency
is used when syncing audio to video. baseLatency + currentTime
doesn't represent anything useful. Which explains why one of the ideas mentioned here is to add another new latency value to serve this purpose.
outputLatency
or baseLatency + outputLatency
when syncing audio to video. I still think it would be helpful to spell that out explicitly. But that could just be me. And I suppose baseLatency + outputLatency
isn't necessarily the whole story either, given that latency can be introduced within individual nodes as well.So that only leaves currentTime. The definition seems to describe Chrome's behavior of an always up-to-date value. So it sounds like we're all set then. Nothing to see here. 🙂
If the solution to #2410 ends up being another latency value that gets added to currentTime, I suppose that would help keep multiple oscillators in phase as well. Will be interesting to see how that plays out.
@padenot
No, this just shows that a MediaStreamAudioDestinationNode outputs silence continuously, like an AudioDestinationNode, as it should.
No, it does not. It shows that Chrome implementation is exactly backwards re OsciallatorNode
that starts without start()
being called and implementation of MediaStreamTrack
of kind
"audio"
does not produce silence per specification, Issue 1262796: MediaStreamTrack does not render silence https://bugs.chromium.org/p/chromium/issues/detail?id=1262796.
Simply comment out OsciallatorNode
to demonstrate no output, specifically no silence produced by MediaStreamAudioDestinationNode
on Chrome.
const ac = new AudioContext();
//const osc = new OscillatorNode(ac);
const msd = new MediaStreamAudioDestinationNode(ac);
const processor = new MediaStreamTrackProcessor({track: msd.stream.getAudioTracks()[0]});
//osc.connect(msd);
processor.readable.pipeTo(new WritableStream({
write(value) {console.log(value)}
}));
Chrome banned me, but didn't fix their bugs.
@padenot
Given OscillatorNode
starts rendering output on Chrome without start()
being called it is impossible to rely on baseLatency
or any other timing mechanism for precision. start()
is useless on Chrome when piping the output through a MediaStreamAudioDestinationNode
. I encountered that issue by experimenting with WebCodecs when AudioFrame
was initially specified and shipped on Chrome and getting or creating a timestamp
was (is) not clear, given Chrome's internal restrictions on sample rate for 'opus'
encoding/decoding. I found that I could "piggy-back" on an OscillatorNode
producing silence connected to an MediaStreamAudioDestinationNode
to get timestamp
for input to MediaStreamTrackGenerator
.
MediaStreamAudioDestinationNode
alone does not produce silence. OscillatorNode
produces output immediately, I don't need to call start()
at all. Again, exactly backwards implementation. They marked the bug as WontFix
. If you want consistency and conformance with the specifications, you will need to file your own Chrome bugs.
Note also, Web Audio API is not the only specification that suffers from Chrome implementation of MediaStreamTrack
of kind
"audio"
not rendering silence, WebRTC is also affected.
On Firefox 95 both <audio>
elements play, on Chromium 99 neither <audio>
element plays https://plnkr.co/edit/XNwNwANBuMzaBKxj?preview.
<!DOCTYPE html>
<html>
<head>
<title>MediaStreamTrack does not render silence on Chromium</title>
<!-- https://bugs.chromium.org/p/chromium/issues/detail?id=1262796 -->
<!-- https://www.w3.org/TR/mediacapture-streams/#life-cycle-and-media-flow -->
</head>
<body>
<script>
var webrtc = new RTCPeerConnection();
var transceiver = webrtc.addTransceiver('audio');
var { track: webrtc_track } = transceiver.receiver;
var webrtc_audio_element = new Audio();
webrtc_audio_element.controls = webrtc_audio_element.autoplay = true;
document.body.appendChild(webrtc_audio_element);
webrtc_audio_element.srcObject = new MediaStream([webrtc_track]);
webrtc_audio_element.ontimeupdate = webrtc_audio_element.onplaying = (
e
) =>
console.assert(e.target.currentTime > 0, [
e.target.currentTime,
e.type,
]);
var ac = new AudioContext();
var msd = new MediaStreamAudioDestinationNode(ac);
var { stream } = msd;
var [webaudio_track] = stream.getAudioTracks();
var webaudio_element = new Audio();
webaudio_element.controls = webaudio_element.autoplay = true;
document.body.appendChild(webaudio_element);
webaudio_element.srcObject = new MediaStream([webaudio_track]);
webaudio_element.ontimeupdate = webaudio_element.onplaying = (e) =>
console.assert(e.target.currentTime > 0, [
e.target.currentTime,
e.type,
]);
</script>
</body>
</html>
Describe the issue
Browser differences with
baseLatency
,currentTime
,start
, andstop
prevent sounds from being scheduled precisely.https://github.com/WebAudio/web-audio-api/issues/2397#issuecomment-257100626 suggests that
start(currentTime + baseLatency)
should reliably play a sound at the indicated time, but that's not true in all browsers. If it were, playing a click sound effect would be easy:This works in Chrome but not in Firefox. There are a few relevant bug reports (956574, 966585, 1228207, 1248731), but they've been open for nearly a decade, and it's unclear if they're even considered spec violations or just optional enhancements.
I consider this a spec bug because everyone seems to have a different idea of what these values are for, making it tougher to develop a reliable app. The spec describes what the fields are without clarifying what they're useful for. (Especially baseLatency.)
If we agree that
start(currentTime + baseLatency)
should be predictable under normal circumstances, then explicit language or examples could be added to the spec. This may also make issues like #2410 less visible. I considered adding a more focused comment under that issue, as the "more global scheduling latency value" mentioned in https://github.com/WebAudio/web-audio-api/issues/2410#issuecomment-846125762 sounds like it would help. But it wouldn't actually help without tightening up the existing fields first. And once that's done, baseLatency might already serve that purpose.Where Is It
https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-currenttime https://webaudio.github.io/web-audio-api/#dom-audiocontext-baselatency https://webaudio.github.io/web-audio-api/#dom-audioscheduledsourcenode-start https://webaudio.github.io/web-audio-api/#dom-audioscheduledsourcenode-stop
Additional Information
baseLatency
currentTime
start(currentTime + baseLatency)
stop(currentTime + baseLatency + 0.002)
currentTime + baseLatency + 0.002
is already in the past.Adding a constant delay (at least 20ms) helps, but isn't reliable since the actual delay depends on how long the current task has taken, how much more time it will take, and even how many other tasks are queued. The only reliable fix is to prerender sound effects using an OfflineAudioContext, then use AudioBufferSourceNode (whose start method accepts a duration) instead of OscillatorNode.
Is there a downside to Chrome's behavior? Does the structure of Firefox (or other implementations) make it particularly difficult to standardize this behavior?