Closed chcunningham closed 10 months ago
I'll take a stab at a PR for this shortly.
The AudioBuffer
is slightly more complex in that it has a way to skip copies and allocations in the majority of scenario, and this has implications in terms of where the memory is and what owns it.
Say you're setting the same AudioBuffer
to two distinct AudioBufferSourceNode
, and start()
those AudioNodes
. This doesn't copy. You can also set the same buffer to a convolver, etc. The copy only happens if one calls getChannelData(n)
to actually see the audio frames and the buffer has been sent to the rendering thread. https://webaudio.github.io/web-audio-api/#dom-audiobuffer-getchanneldata has some info and background.
Here, transferring the AudioBuffer
will work, but it will first allocate storage, copy to this new storage, and then transfer, because the memory is being used by the audio rendering thread.
I'll also note that in addition or instead of doing this, we can also allow the creation of AudioBuffer
from already-allocated storage (but not from SharedArrayBuffer
, only regular ArrayBuffer
).
This would allow transferring the memory owned by the AudioBuffer
(which is the expensive bit), and then communication of the rate and channel count could be made "manually". I believe this would also be useful for other scenarios.
Here, transferring the AudioBuffer will work, but it will first allocate storage, copy to this new storage, and then transfer, because the memory is being used by the audio rendering thread.
Concept SGTM. I'm having a trouble connecting the dots from AudioBuffer's "aquire the content" to how we should implement the transfer steps. Say we've sent an AudioBuffer into an AudioBufferSourceNode and its data is now being sent to the rendering thread. Is there some state set on AudioBuffer when this occurs such that getChannelData() will now always copy? For now I'll assume we have some state, [[must copy]] = true/false.
Related: if the [[internal data]] is being used on the rendering thread as in that example, does the spec indicate that the rendering thread takes a strong reference such that it would be safe for us to detach the [[internal data]] from the AudioBuffer? For now I'll assume yes, its always safe to detach.
Given my assumptions above, here's how I imagine the transfer steps (loosely modeled on those for ImageBitmap)
Their transfer steps, given value and dataHolder, are:
1. If [[must copy]] is true, assign a copy of value's [[internal data]] to dataHolder.[[internal data]].
2. Otherwise, assign a reference of value's [[internal data]] to dataHolder.[[internal data]]
3. Release value's reference to [[internal data]]
4. Assign true value's to [[detached]]
5. Assign 0 to value's [[number of channels]]
6. Assign 0 to value's [[length]]
7. Assign 0 to value's [[sample rate]]
The pressing use case is using WebCodecs to encode audio in a worker that originated from the user's microphone (getUserMedia)
Technically that can already be done by passing MediaStream
from getUserMedia()
to MediaStreamAudioSourceNode
, connecting that node to an AudioWorklet
node, then using Transferable Streams in the AudioWorklet
to transfer the Float32Array
s from inputs
to main thread, or any other thread, minimal, complete, working example https://github.com/microphone-stream/microphone-stream/pull/54/commits/8660971284cdcc950c48a5e12c1ba4d3e4db1567.
We can already stream from main thread to Worker
to other threads without using AudioBuffer
at all, e.g., https://github.com/guest271314/AudioWorkletStream/blob/master/worker.js
let port;
onmessage = async e => {
'use strict';
if (!port) {
[port] = e.ports;
port.onmessage = event => postMessage(event.data);
}
const { urls } = e.data;
// https://github.com/whatwg/streams/blob/master/transferable-streams-explainer.md
const { readable, writable } = new TransformStream();
(async _ => {
for await (const _ of (async function* stream() {
while (urls.length) {
yield (await fetch(urls.shift(), {cache: 'no-store'})).body.pipeTo(writable, {
preventClose: !!urls.length,
});
}
})());
})();
port.postMessage(
{
readable,
},
[readable]
);
};
where since GitHub restricts file size I sliced a single WAV file into several parts, request the files, transfer to AudioWorklet
, process, output to headphones or speakers, or store or stream data.
One issue is this appears to be omitting the fact that timestamp
is not defined at all in the WebCodecs specification, so while AudioBuffer
could be specified as transferable, that does nothing for the user who is already transfering raw PCM (from microphone if required) yet now has to attempt to divine how to generate timestamp
for the AudioFrame
s, which is not indicated how to do at the specification or implementation level.
At https://wc-audio-gen.glitch.me/ this is used
let base_time = outputCtx.currentTime + 0.3;
let buffers = splitBuffer(music_buffer, sampleRate / 2);
for (let buffer of buffers) {
let frame = new AudioFrame({
timestamp: base_time * 1000000,
buffer: buffer
});
base_time += buffer.duration;
encoder.encode(frame);
}
however, we do not know what the algorithm is actually trying to produce, because no algorithm exists; and using that pattern or variations thereof for creation of user-defined AudioFrame
s that are not generated by MediaStreamTrackProcessor.readable.read()
can result in varying playback rate at output for live-streams mid-stream, and MediaStreamTrackGeneratpr
not being capable of producing quaility and consistent output. For example when I do this experiment
let bt = ac.currentTime;
//...
const frame = new AudioFrame({ timestamp: (bt + ac.baseLatency) * 10**6, buffer });
bt += buffer.duration;
at https://github.com/guest271314/webtransport/blob/main/webTransportBreakoutBox.js so that I can omit creating MediaStreamAudioDestinationNode
and OscillatorNode
solely to get an implementation-produced timestamp
in an AudioFrame
at read()
the output has variable playback rate for a live-stream.
Again, I would suggest either defining timestamp
in WebCodecs specification - with accompanying method to produce said timestamp
or whatever name the attrbute will be setlled on re "microseconds", or to simply remove the timestamp
altogether from AudioFrame
, which will render AudioFrame
useless altogether, then we just have a sinlge AudioBuffer
to work with across API's.
Another option is adding timestamp
https://github.com/WICG/web-codecs/issues/156 to AudioBuffer
(and removing AudioFrame
from WebCodecs, similar to how MediaStreamTrack
s described in other specifications refer back to MediaStreamTrack
from Media Capture main), which too, renders AudioFrame
useless, as AudioFrame
is (currently) just from the user perspective an AudioBuffer
with a timestamp
attribute.
In either case timestamp
needs to be demystified, clearly defined, capable of being consistently generated by the user without the need to create additional audio nodes solely to get an internally created implementation timestamp
from MediaStreamTrackProcessor.readable.read()
.
In either case timestamp needs to be demystified, clearly defined, capable of being consistently generated by the user without the need to create additional audio nodes solely to get an internally created implementation timestamp from MediaStreamTrackProcessor.readable.read().
I think you can create the timestamp simply by deciding some starting point (e.g. 0
) for the first packet, and then setting the next packets' timestamps using the duration (established by AudioBuffer length and sampleRate) delta from the first packet.
Please file a separate issue if you'd like to discuss this further. Lets keep this issue focused on Transferability.
I do not find transferability of AudioBuffer
problematic, just transfer the Float32Array
, or Int8Array
or Int16Array
representation or write data to WebAssembly.Memory
, or use Transferable Streams. Using TypedArray
's are considerably faster than constructing and accessing underlying data with getChannelData()
https://github.com/WebAudio/web-audio-api-v2/issues/118#issuecomment-808970057. WICG and W3C banned me, thus I am restricted from addressing this concern at WebCodecs repository. I experiment with WebAudio to a modest extent. This appears to be the cart before the horse. AudioBuffer
is useless outside of underlying Float32Array
and timestamp
is the real concern.
I think you can create the timestamp simply by deciding some starting point (e.g. 0) for the first packet, and then setting the next packets' timestamps using the duration (established by AudioBuffer length and sampleRate) delta from the first packet.
That does not work in practice. The clicks between creation of AudioBufferSourceNode
beginning and ending are audible, when the tab does not crash, and eventually the drift due to inexactness increases the frequency of audible clicks between start and stop of audio nodes. The AudioFrame
from output
at decode()
can be passed to a write()
from MediaStreamTrackGenerator
, however, because the AudioBuffer
length
is always greater than 2000 and an AudioBuffer
from MediaStreamTrackProcessor.readable.read()
is 220 to 400, and AudioWorklet
expects Float32Array
s in 128 length
, while an AudioBuffer
in an AudioFrame
at output
of AudioDecoder
can have length
2568 (2568/128 = 20.0625, which means we will need to store the overflow to avoid writing 0
s, and try to avoid fragmentation of ArrayBuffer
s); the API's are incompatible.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>MediaStreamTrackGenerator Workaround</title>
</head>
<body>
<script>
(async () => {
const ac = new AudioContext();
const msd = new MediaStreamAudioDestinationNode(ac, {
channelCount: 1,
channelCountMode: 'explicit',
channelInterpretation: 'discrete',
});
const osc = new OscillatorNode(ac, {
channelCount: 1,
channelCountMode: 'explicit',
channelInterpretation: 'discrete',
});
osc.connect(msd);
osc.start(ac.currentTime);
const track = msd.stream.getTracks()[0];
const settings = track.getSettings();
const processor = new MediaStreamTrackProcessor(track);
const reader = processor.readable.getReader();
const el = document.createElement("audio");
document.body.appendChild(el);
let firstFrame;
const decoder = new AudioDecoder({
error() {},
async output(frame) {
if (!firstFrame) {
firstFrame = true;
console.log(frame.buffer, frame.buffer.length / 128, frame.buffer.length / 10);
}
const source = ac.createBufferSource();
source.buffer = frame.buffer;
source.connect(ac.destination);
source.start(frame.timestamp / 1000000);
frame.close();
}
});
const encoder = new AudioEncoder({
error() {},
output(chunk, metadata) {
if (metadata.decoderConfig) {
decoder.configure(metadata.decoderConfig);
}
decoder.decode(chunk);
}
});
const config = {
numberOfChannels: 1,
sampleRate: settings.sampleRate,
codec: "opus",
bitrate: 48000
};
encoder.configure(config);
let lastTimestamp;
let baseTimestamp = ac.currentTime + 0.3;
while (true) {
const { value } = await reader.read();
if (!baseTimestamp) {
baseTimestamp = value.timestamp;
}
encoder.encode(
new AudioFrame({
timestamp: baseTimestamp * 10**6,
buffer: value.buffer
})
);
baseTimestamp += value.buffer.duration;
};
})();
</script>
</body>
</html>
I would place priority on making sure the fundamental work, and compaibility with the API's in the domain, not merely "I think" as a suggestion when no documentation exists to support that claim in the actual specifcation, before focusing on transferability of a broken API with regard to WebCodecs AudioEncoder
and AudioDecoder
.
@rtoy helped me to better understand AudioBuffer's "aquire the content", and this lead to an epiphany: we should make AudioFrame "aquire the content" of it's member AudioBuffer. This is important because we want all Frame and Chunk types in WebCodecs to be immutable to avoid toctou security bugs when encoding/decoding. VideoFrame is already immutable and the Chunk types will be soon. AudioBuffer is very much mutable, but we can use "acquire the content" upon construction of an AudioFrame to make it immutable from the POV of WebCodecs.
It is a small wart that the getChannelData() and copyToChannel() methods will still cause it to appear as mutable, but I can accept that (same subtlety already exists in other uses of "acquire the content"). We can add console warnings if folks use these methods on an AudioBuffer who's content has been acquired by an AudioFrame.
With this in mind, I now strongly favor @padenot's second proposal:
I'll also note that in addition or instead of doing this, we can also allow the creation of AudioBuffer from already-allocated storage (but not from SharedArrayBuffer, only regular ArrayBuffer).
My idea being: when transferring and AudioFrame, we would transfer the "acquired content" and use this to create a new AudioBuffer at the destination. I don't know that this even requires a spec change from WebAudio. For example, AudioBuffer is created in the decodeAudioData() steps as follows:
Let buffer be an AudioBuffer containing the final result (after possibly performing sample-rate conversion).
So perhaps we can write something similar in AudioFrame transfer steps, substituting "final result ..." with ~ "transferred acquired data"....
What you really only concerned about transferring here are Float32Array
buffer
(s). You can assign the numberOfChannels
, sampleRate
, and length
to the AudioFrame
after the transfer of the buffer
.
"final result ..." with ~ "transferred underlying buffer from Float32Array(s) representing the channel data, copying numberOfChannels, sampleRate, length from original AudioBuffer, set Float32Array(s) length in original AudioBuffer to 0"....
would be a complete description of what is intended to occur; "data" is a generic term that we do not have to repeat from Web Audio API wording.
The problem you face, again, is where does the timestamp
get generated from in that algorithm; ostensibly in some way from the AudioBuffer
, or when the AudioBuffer
is tranferred, unless that algorithm is simply omitted from the documentation, deliberately?
This is not priority-1
anymore, because Web Codecs doesn't need it as much (cc @chcunningham).
TPAC 2022 action items:
Transferable
.2023 TPAC Audio WG Discussion:
The WG will not pursue this since the need from WebCodecs side has been resolved.
Describe the feature Follow up stemming from WebAudio/web-audio-api-v2#111. Once AudioBuffer is exposed to DedicatedWorker, we'll want to transfer AudioBuffers created elsewhere into the DedicatedWorker.
Is there a prototype? Chromium would like to prototype ASAP and ship this alongside the rest of the WebCodecs API.
Describe the feature in more detail The pressing use case is using WebCodecs to encode audio in a worker that originated from the user's microphone (getUserMedia). The audio will be sent to the worker by transferring the MediaStreamTrackProcessor readable ReadableStream. The individual AudioFrames in the stream will themselves be transferred, which would trigger transfer of their nested AudioBuffers.
Transferring is a move operation, so we must consider what happens to the object that is left behind. I propose that we follow the model of ArrayBuffer.
Doing the same for AudioBuffer, would probably entail