Closed guest271314 closed 4 years ago
Included:
Improved result where these problems are observed
The problem is twofold
- obvious gaps in playback (probably due to waiting on ended event of source buffer)
- posting the data to
AudioWorklet
to avoid gaps requires substantial attention to the 128 byte perprocess()
execution, where when the test is not correctly configured can crash the browser tab and the underlying OS.
using AudioWorklet
by using subarray()
to create 128 element Float32Array
s (without using WASM https://github.com/GoogleChromeLabs/web-audio-samples/tree/master/audio-worklet/design-pattern/wasm-ring-buffer).
Posting the code here now before crash the browser and OS again and potentially lose the tentatively working version
await ac.audioWorklet.addModule("audioWorklet.js")
const aw = new AudioWorkletNode(ac, "audio-data-worklet-stream", {
numberOfInputs: 2,
numberOfOutputs:2,
processorOptions: {
buffers: {channel0:[], channel1:[]}
}
});
aw.port.onmessage = async e => {
console.log(e.data);
if (e.data === "ended") {
track.stop();
track.enabled = false;
await ac.suspend();
}
};
aw.connect(msd);
//
const ab = await ac.decodeAudioData(uint8array.buffer);
let [channel0, channel1] = [ab.getChannelData(0), ab.getChannelData(1)];
aw.port.postMessage({channel0, channel1}, [channel0.buffer, channel1.buffer]);
class AudioDataWorkletStream extends AudioWorkletProcessor {
constructor(options) {
super();
if (options.processorOptions) {
Object.assign(this, options.processorOptions);
}
this.i = 0;
this.resolve = void 0;
this.promise = new Promise(resolve => this.resolve = resolve)
.then(_ => this.port.postMessage("ended"));
this.port.onmessage = e => {
const {
channel0, channel1
} = e.data;
++this.i;
for (let i = 0; i < channel0.length; i += 128) {
this.buffers.channel0.push(channel0.subarray(i, i + 128));
}
for (let i = 0; i < channel1.length; i += 128) {
this.buffers.channel1.push(channel1.subarray(i, i + 128));
}
}
}
process(inputs, outputs) {
if (this.i > 0 && this.buffers.channel0.length === 0) {
this.resolve();
globalThis.console.log(currentTime, currentFrame, this.buffers);
return false;
}
for (let channel = 0; channel < outputs.length; ++channel) {
const [outputChannel] = outputs[channel];
let [inputChannel] = inputs[channel];
if (this.i > 0 && this.buffers.channel0.length) {
inputChannel = this.buffers[`channel${channel}`].shift();
}
outputChannel.set(inputChannel);
}
return true;
}
}
registerProcessor("audio-data-worklet-stream", AudioDataWorkletStream);
However, the "glitches" or "gaps" between the merged files is still perceptible in some instances.
It should be possible to set AudioWorklet
data from an input media resource, for example a ReadableStream
- without having to split and merge files or use MediaSource
or HTMLMediaElement
- to playback media without first reading the entire file, rather, to playback the media while read the input, in parallel.
How will using WebCodecs with Web Audio API improve playback quality when the requirement is to use disparate and arbitrary media sources into a single, or multiple inputs to Web Audio API?
How is WebCodecs concretely conceptualized as being capable able to resolve the issue: to process partial media file input, arbitrary selection of parts of media, or a potentially infinite input stream via WebCodecs; where is the connection made to Web Audio API?
Has OfflineAudioContext rendering output been considered as an input to WebCodecs?
@JohnWeisz The AudioBuffer
model itself is an issue relevant to memory usage. A 3.5MB Opus file split and merged into a single file consumes at least 300MB of memory streaming the channel data to AudioWorklet
from main thread. Executing disconnect()
and exposing garbage collection and using self.gc()
does not change the result. The total amount of Float32Array
values is 140MB. See this comment at Is there a way to stop Web Audio API decodeAudioData method memory leak?
This is a reasonable workaround. But I agree that this shouldn't be needed. The browser should be able to collect these without any help from you as a long as you drop the references to the source buffers and the audio buffers. – Raymond Toy Feb 4 '19 at 16:52
Using MediaSource
drastically reduces memory usage to 50MB for the same original Opus file written to a WebM container.
Given the PCM processing model am not sure how the overall memory usage can be reduced at all. What can be improved is garbage collection once use of the audio buffers is no longer needed.
If Web Audio API connects to WebCodecs via MediaStreamTrack
that could potentially reduce memory usage - as long as the pattern is a direct to AudioContext.destination
- and not further processing of Float32Array
s. However, to date we have no design pattern concept of how WebCodecs will connect to Web Audio API to experiment, and verify input and output, thus this issue.
@guest271314
I totally agree AudioBuffer
is not really an efficient model for audio data consumption, although I don't see a viable yet generally usable alternative either (that said, I'm working on a proof-of-concept streaming-based AudioBuffer alternative in my spare time, and the streaming of obvious formats is well supported, so there might be something in the near future I can help with hopefully).
Using MediaStream
and MediaStreamTrack
is a viable alternative for AudioBuffer
usage which results in less memory usage that is observable. Exposing the media decoders and encoders that HTMLMediaElement
uses (in Window
and Worker
scope) without using <audio>
element could be substituted for MediaElementAudioSourceNode
to avoid reliance on DOM
. The only reason to use Web Audio API in such a case would be for piping the stream through to output an audio effect.
Since per the Explainer WebCodecs is focused on MediaStreamTrack
s
let decoded = await new AudioDecoder({codec: 'opus'});
let mediaStream = new MediaStream([decoded]);
let webAudioConnection = new MediaStreamAudioSourceNode(ac, {mediaStream})
could be used to connect to Web Audio API, if necessary.
Am trying to gather how the specification authors who have suggested that WebCodecs will somehow solve issues posted at Web Audio API repositories are actually cconceptualizing a prospective design pattern. As yet have not located any documentation that can point to which provides evidence of any such concept, flow-chart, algorithms, or API connection architecture.
A proof-of-concept for a tentative WebCodecs-type connection to Web Audio API.
To preface, have tested the code at least hundreds of times locally serving a ~35MB WAV file (22.05 kHz, 2 channels) converted from Opus using
$ opusdec --rate 22050 --force-wav house--64kbs.opus output.wav
which can be downloaded from https://drive.google.com/file/d/19-28SYFjhHg_a5NqPG1GIqV_sy5iMQ5x/view. Have no way to test the code at other architectures or devices right now.
The goal of this code is to demonstrate how a WebCodecs-like API would connect to Web Audio API.
Specifically, the primary goal is to request any audio file having any codec and to parse (demux) and commence playback of that media resource during reading of the request - before the entire file has been downloaded - and for that audio playback to be "seamless" (still some work to do there, though "works" as a base example of the pattern).
The bulk of the work in this example is done in a Worker
.
Kindly test the code at various devices, architectures to verify the output.
index.html
<!DOCTYPE html>
<html>
<head>
<title>Stream audio from Worker to AudioWorklet</title>
</head>
<body>
<audio autoplay controls></audio>
<script>
// AudioWorkletStream
// Stream audio from Worker to AudioWorklet
// guest271314 2-24-2020
"use strict";
if (gc) gc();
const handleEvents = e => globalThis.console.log(e.type === "statechange" ? e.target.state : e.type);
// TODO "seamless" playback; media fragment concatenation, processing; seekable streams
const audio = document.querySelector("audio");
// audio.ontimeupdate = e => console.log(audio.currentTime, ac.currentTime);
const events = ["durationchange", "ended", "loadedmetadata"
, "pause", "play", "playing", "suspend", "waiting"];
for (let event of events) audio.addEventListener(event, handleEvents);
class AudioWorkletStream {
// options here can be exhaustive, e.g., providing means to connect multiple SharedWorkers
// file handles, Blobs, ArrayBuffers, ReadableStream, WritableStream, HTMLMediaElement
// for multiple audio, video, images, text tracks, et al., media stream input, output processing
constructor({
codecs = ["audio/wav", "wav22k2ch"]
, urls = ["http://localhost:8000?wav=true"]
, sampleRate = 44100
, numberOfChannels = 2
, latencyHint = "playback"
, workletOptions = {}
} = {}) {
this.mediaStreamTrack = new Promise(async resolve => {
gc();
const ac = new AudioContext({
sampleRate,
numberOfChannels,
latencyHint
});
await ac.suspend();
ac.onstatechange = handleEvents;
await ac.audioWorklet.addModule("audioWorklet.js");
const aw = new AudioWorkletNode(ac, "audio-data-worklet-stream", workletOptions);
const msd = new MediaStreamAudioDestinationNode(ac);
const {
stream
} = msd;
// MediaStreamTrack, kind "audio"
const [track] = stream.getAudioTracks();
// fulfill Promise with MediaStreamTrack
resolve(track);
// set enabled to false
track.enabled = false;
aw.connect(msd);
// inactive event at Chromium 81
// not longer defined in Media Capture and Streams specification
// Firefox does not yet fully support AudioWorklet
stream.oninactive = stream.onactive = handleEvents;
// not dispatched at Chromium 81
track.onmute = track.onunmute = track.onended = handleEvents;
// transfer sources, here, e.g., file handle
// ReadableStream, WritableStream (https://github.com/whatwg/streams/blob/master/transferable-streams-explainer.md)
// etc.
const worker = new Worker("worker.js", {
type: "module"
});
worker.postMessage({
port: aw.port,
codecs,
urls
}, [aw.port]);
worker.onmessage = async e => {
// use suspend(), resume() to synchronize to degree possible
// with currentTime of <audio> with MediaStream set as srcObject
if (e.data.start) {
track.enabled = true;
await ac.resume();
}
if (e.data.ended) {
track.stop();
track.enabled = false;
await ac.suspend();
msd.disconnect();
aw.disconnect();
aw.port.close();
worker.terminate();
await ac.close();
for (let event of events) audio.removeEventListener(event, handleEvents);
if (gc) gc();
// currentTime(s)
console.log({
audio: audio.currentTime
, currentTime: e.data.currentTime
, currentFrame: e.data.currentFrame
, minutes: Math.floor(e.data.currentTime / 60)
, seconds: ((e.data.currentTime / 60) - Math.floor(e.data.currentTime / 60)) * 60
});
};
};
});
};
};
(async() => {
// set parameters as arrays for potential "infinite" input, output stream
let workletStream = new AudioWorkletStream({
// multiple codecs
codecs: ["audio/wav", "wav22k2ch"]
// multiple URL's
, urls: ["http://localhost:8000?wav=true"]
, sampleRate: 22050
, numberOfChannels: 2
, latencyHint: "playback"
, workletOptions: {
numberOfInputs: 2,
numberOfOutputs: 2,
channelCount: 2,
processorOptions: {
buffers: {
channel0: [],
channel1: []
},
i: 0,
promise: void 0,
resolve: void 0
}
}
});
let {mediaStreamTrack} = workletStream;
let mediaStream = new MediaStream();
// Chromium bug, https://bugs.chromium.org/p/chromium/issues/detail?id=1045832
// currentTime does not progress at HTMLMediaElement when addTrack() is called on a MediaStream
// set as srcObject on <audio>, <video> where no MediaStreamTrack (getAudioTracks() // []) is previously set
mediaStream.addTrack(await mediaStreamTrack);
audio.srcObject = mediaStream;
})();
</script>
</body>
</html>
worker.js
// AudioWorkletStream
// Stream audio from Worker to AudioWorklet (POC)
// guest271314 2-24-2020
import {CODECS} from "./codecs.js";
if (gc) gc();
const delay = async ms => await new Promise(resolve => setTimeout(resolve, ms));
let port;
onmessage = async e => {
"use strict";
if (!port) {
([port] = e.ports);
port.onmessage = event => postMessage(event.data);
}
let init = false;
const {
codecs: [mime, codec],
urls: [url]
} = e.data;
const {
default: processStream
} = await
import (CODECS.get(mime).get(codec));
let next = [];
// costs latency, not necessary
let writes = 0;
let bytesWritten = 0;
let samplesLength = 0;
let portTransfers = 0;
// https://fetch-stream-audio.anthum.com/2mbps/house-41000hz-trim.wav
const response = await fetch(url);
const readable = response.body;
const writable = new WritableStream({
async write(value) {
bytesWritten += value.length;
// value (Uintt8Array) length is not guaranteed to be multiple of 2 for Uint16Array
// store remainder of value in next array
if (value.length % 2 !== 0 && next.length === 0) {
next.push(...value.slice(value.length - 1));
value = value.slice(0, value.length - 1);
} else {
const prev = [...next.splice(0, next.length), ...value];
do {
next.push(...prev.splice(-1));
} while (prev.length % 2 !== 0);
value = new Uint8Array(prev);
}
// The length is in bytes, but the array is 16 bits, so divide by 2.
// let length;
// length = (data[20] + data[21] * 0x10000) / 2;
// we do not need length here, we process input until no more, or infinity
let data = new Uint16Array(value.buffer);
if (!init) {
init = true;
data = data.subarray(22);
}
const {
ch0, ch1
} = processStream(data);
do {
const channel0 = new Float32Array(ch0.splice(0, 128));
const channel1 = new Float32Array(ch1.splice(0, 128));
samplesLength += channel0.length + channel1.length;
port.postMessage({
channel0, channel1
}, [channel0.buffer, channel1.buffer]);
++portTransfers;
} while (ch0.length);
++writes;
// wait N ms to avoid choppy output during initial 30 seconds of playback
// affects total value of writes
await delay(350);
}, close() {
// with await delay(ms) {writes: 340, bytesWritten: 35491032, samplesWritten: 69321}
// without await delay(ms) {writes: 72, bytesWritten: 35491032, samplesWritten: 69321}
globalThis.console.log({
writes // variable
, bytesWritten // 35491032
, samplesLength // variable (69320 +/-1)
, portTransfers // valiable
});
if (gc) gc();
}
}, new CountQueuingStrategy({
highWaterMark: 128
}));
await readable.pipeTo(writable, {
preventCancel: true
});
globalThis.console.log("read/write done");
}
codecs.js
export const CODECS = new Map([["audio/wav", new Map([["wav22k2ch","./wav22k2ch.js"]])]]);
// $ opusdec --rate 22050 --force-wav house--64kbs.opus output.wav
// Decoding to 22050 Hz (2 channels)
// Encoded with libopus 1.1
// ENCODER=opusenc from opus-tools 0.1.9
// ENCODER_OPTIONS=--bitrate 64
// Decoding complete.
//
// $ mediainfo output.wav
// General
// Complete name : output.wav
// Format : Wave
// File size : 33.8 MiB
// Duration : 6 min 42 s
// Overall bit rate mode: Constant
// Overall bit rate : 706 kb/s
//
// Audio
// Format : PCM
// Format settings : Little / Signed
// Codec ID : 1
// Duration : 6 min 42 s
// Bit rate mode : Constant
// Bit rate : 705.6 kb/s
// Channel(s) : 2 channels
// Sampling rate : 22.05 kHz
// Bit depth : 16 bits
// Stream size: 33.8 MiB (100%)
wav22k2ch.js
// https://stackoverflow.com/a/35248852
export default function int16ToFloat32(inputArray) {
let ch0 = [];
let ch1 = [];
for (let i = 0; i < inputArray.length; i++) {
const int = inputArray[i];
// If the high bit is on, then it is a negative number, and actually counts backwards.
const float = (int >= 0x8000) ? -(0x10000 - int) / 0x8000 : int / 0x7FFF;
// toggle setting data to channels 0, 1
if (i % 2 === 0) {
ch0.push(float);
} else {
ch1.push(float);
}
};
return {
ch0, ch1
};
}
audioWorklet.js
class AudioDataWorkletStream extends AudioWorkletProcessor {
constructor(options) {
super(options);
if (gc) {
gc();
};
if (options.processorOptions) {
Object.assign(this, options.processorOptions);
};
this.port.onmessage = this.appendBuffers.bind(this);
}
appendBuffers({
data: {
channel0, channel1
}
}) {
this.buffers.channel0.push(channel0);
this.buffers.channel1.push(channel1);
++this.i;
if (this.i === 1) {
this.port.postMessage({"start":true});
globalThis.console.log({
currentTime, currentFrame, buffers: this.buffers
});
};
}
endOfStream() {
this.port
.postMessage({
"ended": true,
currentTime,
currentFrame
});
globalThis.console.log({
currentTime, currentFrame, sampleRate, buffers: this.buffers
});
if (gc) gc();
}
process(inputs, outputs) {
if (this.i > 0 && this.buffers.channel0.length === 0 && this.buffers.channel1.length === 0) {
return false;
}
for (let channel = 0; channel < outputs.length; ++channel) {
const [outputChannel] = outputs[channel];
let inputChannel;
if (this.i > 0 && this.buffers.channel0.length > 0) {
inputChannel = this.buffers[`channel${channel}`].shift();
} else {
if (this.i > 0 && this.buffers.channel1.length > 0 && this.buffers.channel0.length === 0) {
// handle channel0.length === 0, channel1.length > 0
inputChannel = this.buffers.channel1.shift();
// end of stream
this.endOfStream();
};
};
outputChannel.set(inputChannel);
};
return true;
};
};
registerProcessor("audio-data-worklet-stream", AudioDataWorkletStream);
Theoretically, since the pattern is initiated in this case by fetch()
network request to response Body
ReadableStream
to WritableStream
to AudioWorklet
which outputs to a MediaStreamTrack
, and given cache being disabled at the browser, there should not be any impact on memory, as no content is stored, particularly after the media has completed playback (unless the stream is infinite), meaning there is no reason for any of the API's used to retain the used data in memory for any purpose.
The same procedure should be possible for input 44100
sample rate, or any other codec. Apply the parsing algorithm in main thread, Worker
, SharedWorker
, module script, etc., then transfer Float32Array
s to AudioWorklet
, which avoids decodeAudioData()
, AudioBuffer
, and AudioBufferSourceNode
altogether.
Was able to parse hexadecimal encoded PCM within Matroska file output by MediaRecorder
at Chromium using mkvparse
.
Next will attempt to parse Opus, both in a WebM and OGG containter, without using WASM, using only the API's shipped with the browser.
Demonstration with 277MB WAV file https://plnkr.co/edit/nECtUZ?p=preview.
Virtual F2F:
AudioContext
is done via AudioWorkletNode
, and a ring buffer using SharedArrayBuffer
or postMessage
. This is then fed to a AudioEncoder
.Web Codec is low level and flexible and can encode PCM data.
AFAICT Web Codec is not specified or implemented. The issues closed reference WebCodecs somehow being capable of connecting to WebAudio API. However, no such evidence exists that is or will be the case. Until then, this issue should remain open.
@padenot
- Getting PCM in or out of an
AudioContext
is done viaAudioWorkletNode
, and a ring buffer usingSharedArrayBuffer
orpostMessage
. This is then fed to aAudioEncoder
.
postMessage()
option is not viable in practice. Perhaps in theory that works, but not in production code.
As have already proven in several issues and code examples using postMessage()
for a substantial amount of data input will inevitably result in gaps in playback at Firefox and Chrome, Chromium browsers.
Describe the feature
WebCodecs proposal is mentioned at V1 in several issues
Is there a prototype?
No.
It is not at all clear how WebCodecs will solve any of the issues where Web Audio API specification authors have suggested the same might eventually be possible.
So far there are only unsubstantiated claims without any hint of an algorithm or proof-of-concept in code where using WebCodecs would produce any result different than is currently possible using Web Audio API V1.
Since Web Audio API only processes PCM
without any documentation, example code, or even a concept to prove otherwise the gist of WebCodecs relevant to Web Audio API V1 or V2 would essentially be an arbitrary audio codec input to PCM output. Nothing more can be gleaned from the comments thus far, besides "hope" for some furture capability proferred by WebCodecs that will somehow be connectable to Web Audio API. For individuals that do not entertain hope, rather base conclusions on facts, there are currently no facts available which support the claims made thus far that WebCodecs will provide some means to solve the issues closed that point to WebCodecs proposal as being capable of remedying the problems described in the issues.
It is not clear how WebCodecs will change or modify at all Web Audio API processing model re "5.1. Audio sample format" or "5.2. Rendering", presently making any reference to "This will now be handled by https://discourse.wicg.io/t/webcodecs-proposal/3662" moot, null and void, particularly without any example or basic flow-chart detailing how precisely Web Audio API will connect to WebCodecs.
Describe the feature in more detail
This issue is intended to be a meta thread for WebCodecs prospective connection with the as-yet un-sepecified and un-deployed WebCodecs.
As a concrete example of the problem with simply suggesting WebCodecs will solve a current problem with the Web Audio API in the future, without evidence to support such a claim, following up on https://github.com/WebAudio/web-audio-api/issues/2135 where
decodeAudioData()
does not support decoding partial content, leading to memory usage that eventually crashes the tab, was able to cobble together a working example of streaming Opus audio attempting to solve thisMediaSource
issue https://github.com/w3c/media-source/issues/245 using Web Audio API by first splitting the file into N 15 second Opus files withffmpeg
ormkvmerge
then sorting and merging the files again into a single file usingFile
sthen serving the map of sorted offsets with the merged file
which allows executing
decodeAudioData()
on one of the included N merged parts of the single served file then commencing playback of that audio segment without decoding all of the N parts of the file first - and without discernable gaps in playback (not achieved with first attempt at splitting the file). Included is the first execution of the spittling of the file and original file in attached zip archive.The problem is twofold
ended
event of source buffer)AudioWorklet
to avoid gaps requires substantial attention to the 128 byte perprocess()
execution, where when the test is not correctly configured can crash the browser tab and the underlying OS.The suggested solution for this case as evidenced by the closure of similar use cases is now to use WebCodecs. However, it is not immediately clear at all how WebCodecs will overcome Web Audio API "Audio sampling format" and "Rendering" sections. This corresponding subject-matter WebCodecs issue https://github.com/WICG/web-codecs/issues/28 is not resolved.
Given the code example above and cross-referencing to WebCodecs examples https://github.com/WICG/web-codecs/blob/master/explainer.md#examples the only code where Web Audio API would potentially connect to output of WebCodecs is by way of a
MediaStreamTrack
, not PCMThe question then becomes would the output
MediaStreamTrack
beconnect()
ed to a an audio node? If no post-processing or output modification is required why would Web Audio API be used at all in that case?Given the above example code that combines N files into a single file to preserve metadata for each file for
decodeAudioData()
what pattern would fulfill the suggestion that WebCodecs would solve the problems described in the linked closed issues that refer to WebCodecs?Ideally we should be able to parse any input file having any audio codec to
AudioWorklet
asFloat32Array
without having to also useMediaStreamTrack
, or again, why use Web Audio API at all in that case where we can simplyaudio.srcObject = new MediaStream([await audioDecorder])
?Is
expected to be used for connect to WebCodecs, where we could stream the Opus file without splitting the file into N parts - and playback will begin without parsing the entire input, especially where the input might not have a definitive end - even though that method is only available at Firefox?
The answer to, but not limited to the above questions, relevant to the reference to WebCodecs dependence, reliance or referral should be clear.
Re
Can the specification authors who have made the above claims kindly clearly write out precisely what you are "doing" re WebCodecs being connectable in any meaningful way to Web Audio API (V1 and V2)?