[meta] WebCodecs: Clearly define how Web Audio API input will connect to WebCodecs output

Describe the feature

WebCodecs proposal is mentioned at V1 in several issues

https://github.com/WebAudio/web-audio-api/issues/335#issuecomment-532033609

This will now be handled by https://discourse.wicg.io/t/webcodecs-proposal/3662
https://github.com/WebAudio/web-audio-api/issues/337#issuecomment-532033552

This will now be handled by https://discourse.wicg.io/t/webcodecs-proposal/3662
https://github.com/WebAudio/web-audio-api/issues/371#issuecomment-532033490

This will now be handled by https://discourse.wicg.io/t/webcodecs-proposal/3662
https://github.com/WebAudio/web-audio-api/issues/1468#issuecomment-532048090

Likely to be solved by webcodecs. No response since Oct 2018
https://github.com/WebAudio/web-audio-api/issues/1850#issuecomment-531669553

Closing this, it is not an area of the api that we will be working on in V2. However we hope that this will be covered by https://github.com/WICG/web-codecs https://github.com/WICG/web-codecs/blob/master/explainer.md https://discourse.wicg.io/t/webcodecs-proposal/3662
https://github.com/WebAudio/web-audio-api/issues/1850#issuecomment-576885796

Don't know how it will work, but having two apis that overlap a little is not a great pattern. It seems more beneficial for everyone if WebCodecs can solve this, rather than grafting some kind of partial solution to decodeAudioData but not be able to do other things that WebCodecs can do. It's up to you and us to do our best to make it happen.
https://github.com/WebAudio/web-audio-api/issues/1872#issuecomment-532058536

This will not be considered for V2, as it is better placed in webcodecs. https://github.com/WICG/web-codecs
https://github.com/WebAudio/web-audio-api/issues/2135#issuecomment-576173699

This is what we're doing in Web Codecs.
https://github.com/WebAudio/web-audio-api/issues/2135#issuecomment-576222122
- implement what is needed here in js/wasm has been done numerous times before, and works well, allowing authors that need to solve this use case to wait until WebCodec is ready.
https://github.com/WebAudio/web-audio-api/issues/2135#issuecomment-576348182

Let me also add that we will work closely with the WebCodecs group to make sure that it can provide solutions to the problems people have had with decodeAudioData. Ideally, WebCodecs should be able to do everything that decodeAudioData does and more.
https://github.com/WebAudio/web-audio-api/issues/337#issuecomment-576222566

New feature requests go in the v2 repo. New feature requests about audio decoding and encoding go in the Web Codecs repo, but this is use-case has been handled from day one there.

Is there a prototype?

No.

It is not at all clear how WebCodecs will solve any of the issues where Web Audio API specification authors have suggested the same might eventually be possible.

So far there are only unsubstantiated claims without any hint of an algorithm or proof-of-concept in code where using WebCodecs would produce any result different than is currently possible using Web Audio API V1.

Since Web Audio API only processes PCM

1.4. The AudioBuffer Interface This interface represents a memory-resident audio asset. Its format is non-interleaved 32-bit floating-point linear PCM values with a normal range of [−1,1] , but values are not limited to this range. It can contain one or more channels. Typically, it would be expected that the length of the PCM data would be fairly short (usually somewhat less than a minute). For longer sounds, such as music soundtracks, streaming should be used with the audio element and MediaElementAudioSourceNode.

5.1. Audio sample format Linear pulse code modulation (linear PCM) describes a format where the audio values are sampled at a regular interval, and where the quantization levels between two successive values are linearly uniform.

Whenever signal values are exposed to script in this specification, they are in linear 32-bit floating point pulse code modulation format (linear 32-bit float PCM), often in the form of Float32Array objects.

5.2. Rendering The range of all audio signals at a destination node of any audio graph is nominally [-1, 1]. The audio rendition of signal values outside this range, or of the values NaN, positive infinity or negative infinity, is undefined by this specification.

without any documentation, example code, or even a concept to prove otherwise the gist of WebCodecs relevant to Web Audio API V1 or V2 would essentially be an arbitrary audio codec input to PCM output. Nothing more can be gleaned from the comments thus far, besides "hope" for some furture capability proferred by WebCodecs that will somehow be connectable to Web Audio API. For individuals that do not entertain hope, rather base conclusions on facts, there are currently no facts available which support the claims made thus far that WebCodecs will provide some means to solve the issues closed that point to WebCodecs proposal as being capable of remedying the problems described in the issues.

It is not clear how WebCodecs will change or modify at all Web Audio API processing model re "5.1. Audio sample format" or "5.2. Rendering", presently making any reference to "This will now be handled by https://discourse.wicg.io/t/webcodecs-proposal/3662" moot, null and void, particularly without any example or basic flow-chart detailing how precisely Web Audio API will connect to WebCodecs.

Describe the feature in more detail

This issue is intended to be a meta thread for WebCodecs prospective connection with the as-yet un-sepecified and un-deployed WebCodecs.

As a concrete example of the problem with simply suggesting WebCodecs will solve a current problem with the Web Audio API in the future, without evidence to support such a claim, following up on https://github.com/WebAudio/web-audio-api/issues/2135 where decodeAudioData() does not support decoding partial content, leading to memory usage that eventually crashes the tab, was able to cobble together a working example of streaming Opus audio attempting to solve this MediaSource issue https://github.com/w3c/media-source/issues/245 using Web Audio API by first splitting the file into N 15 second Opus files with ffmpeg or mkvmerge then sorting and merging the files again into a single file using Files

      const sorted = [...e.target.files].sort((a, b) => 
                               +b.name.replace(/\D/g, "") > a.name.replace(/\D/g, "") ? -1 : 0);
      const offsets = sorted.map(({size}) => size);
      console.log(offsets);
      const blob = new Blob(sorted);
      const a = document.createElement("a");
      a.href = URL.createObjectURL(blob);
      a.download = "house_opus_merged";
      a.click();

then serving the map of sorted offsets with the merged file

<?php 
  if ($_GET["opus"]) {
    $offsets = "[135567,127418,120207,124575,129572,120537,120918,133856,137974,135627,138015,128146,128113,127626,129090,124280,127461,133903,133995,137177,137409,148202,147645,143197,126504,127171,94884]";
    header("Content-Type: application/octet-stream");
    header("Content-Description: " . $offsets);
    $file = file_get_contents("./house_opus_merged", FILE_USE_INCLUDE_PATH);
    echo $file;
  }
?>

which allows executing decodeAudioData() on one of the included N merged parts of the single served file then commencing playback of that audio segment without decoding all of the N parts of the file first - and without discernable gaps in playback (not achieved with first attempt at splitting the file). Included is the first execution of the spittling of the file and original file in attached zip archive.

(async() => {
  try {
  const request = await fetch("http://localhost:8000?opus=true");
  const response = await request;
  const offsets = JSON.parse(response.headers.get("Content-Description"));
  const inputStream = response.body.getReader();

  const audioData = [];
  let n = 0;
  let offset;
  let controller; 
  const ac = new AudioContext({
    numberOfChannels: 2,
    sampleRate: 48000
  });

  const msd = new MediaStreamAudioDestinationNode(ac);
  const {
    stream
  } = msd;
  const gainNode = new GainNode(ac, {
    gain: 1
  });

  gainNode.connect(msd);

  const [track] = stream.getAudioTracks();
  track.enabled = false;

  const audio = new Audio();

  audio.autoplay = true;
  audio.controls = true;

  document.body.appendChild(audio);

  const outputStream = new ReadableStream({
    async start(c) {
      return controller = c;
    }
  });
  const audioOutputReader = outputStream.getReader();
  async function processAudioOutputStream({
    value: sourceBufferSegmentEnded,
    done
  }) {
    if (done) {
      await outputStream.closed;
      return "processAudioOutputStream done:" + done;
    }
    const {
      segment, trackOffset
    } = await sourceBufferSegmentEnded();
    if (segment === offsets.length - 1) {
      track.stop();
      track.disabled = true;
      audioOutputReader.cancel();
    }
    return processAudioOutputStream(await audioOutputReader.read());
  }

  async function processAudioInputStream({
    value, done
  }) {
    if (done) {
      await inputStream.closed;
      return "processAudioInputStream done:" + done;
    }
    offset = 0;
    do {
      audioData.push(value[offset]);
      offset++;
      if (audioData.length === offsets[n]) {
        const uint8array = new Uint8Array(audioData.splice(0, offsets[n]));

        controller.enqueue((props => () => {
          return new Promise(async resolve => {
            const ab = await ac.decodeAudioData(uint8array.buffer);
            const source = ac.createBufferSource();
            source.connect(gainNode);
            source.buffer = ab;

            source.start();

            source.onended = _ => {              
              source.disconnect(gainNode);
              resolve(props);
            };
            if (audio.srcObject !== stream) {
              track.enabled = true;
              audio.srcObject = stream;
            };

          })
        })({
          segment: n,
          trackOffset: offsets[n]
        }));
        n++;
      }
    } while (offset < value.length);
    return processAudioInputStream(await inputStream.read());
  }
  const input = await processAudioInputStream(await inputStream.read());
  console.log({input})
  const output = await processAudioOutputStream(await audioOutputReader.read());
  console.log({output});
  } catch (e) {
    console.trace();
  }
})();

The problem is twofold

obvious gaps in playback (probably due to waiting on ended event of source buffer)
posting the data to AudioWorklet to avoid gaps requires substantial attention to the 128 byte per process() execution, where when the test is not correctly configured can crash the browser tab and the underlying OS.

The suggested solution for this case as evidenced by the closure of similar use cases is now to use WebCodecs. However, it is not immediately clear at all how WebCodecs will overcome Web Audio API "Audio sampling format" and "Rendering" sections. This corresponding subject-matter WebCodecs issue https://github.com/WICG/web-codecs/issues/28 is not resolved.

Given the code example above and cross-referencing to WebCodecs examples https://github.com/WICG/web-codecs/blob/master/explainer.md#examples the only code where Web Audio API would potentially connect to output of WebCodecs is by way of a MediaStreamTrack, not PCM

// Finally the decoders are created and the encoded media is piped through the decoder
// and into the TrackerWriters which converts them into MediaStreamTracks.
const audioDecoder = new AudioDecoder({codec: 'opus'});

The question then becomes would the output MediaStreamTrack be connect()ed to a an audio node? If no post-processing or output modification is required why would Web Audio API be used at all in that case?

Given the above example code that combines N files into a single file to preserve metadata for each file for decodeAudioData() what pattern would fulfill the suggestion that WebCodecs would solve the problems described in the linked closed issues that refer to WebCodecs?

Ideally we should be able to parse any input file having any audio codec to AudioWorklet as Float32Array without having to also use MediaStreamTrack, or again, why use Web Audio API at all in that case where we can simply audio.srcObject = new MediaStream([await audioDecorder])?

audioTrackNode = new MediaStreamTrackAudioSourceNode(context, {mediaStreamTrack:await audioDecoder});

expected to be used for connect to WebCodecs, where we could stream the Opus file without splitting the file into N parts - and playback will begin without parsing the entire input, especially where the input might not have a definitive end - even though that method is only available at Firefox?

The answer to, but not limited to the above questions, relevant to the reference to WebCodecs dependence, reliance or referral should be clear.

This is what we're doing in Web Codecs.

Can the specification authors who have made the above claims kindly clearly write out precisely what you are "doing" re WebCodecs being connectable in any meaningful way to Web Audio API (V1 and V2)?

Included:

house--64kbs.opus (https://github.com/xiph/opus-tools/issues/49)
house_opus_merged

opus_files.zip

Improved result where these problems are observed

The problem is twofold

obvious gaps in playback (probably due to waiting on ended event of source buffer)

posting the data to AudioWorklet to avoid gaps requires substantial attention to the 128 byte per process() execution, where when the test is not correctly configured can crash the browser tab and the underlying OS.

using AudioWorklet by using subarray() to create 128 element Float32Arrays (without using WASM https://github.com/GoogleChromeLabs/web-audio-samples/tree/master/audio-worklet/design-pattern/wasm-ring-buffer).

Posting the code here now before crash the browser and OS again and potentially lose the tentatively working version

await ac.audioWorklet.addModule("audioWorklet.js")
  const aw = new AudioWorkletNode(ac, "audio-data-worklet-stream", {
        numberOfInputs: 2,
        numberOfOutputs:2,
        processorOptions: {
          buffers: {channel0:[], channel1:[]}
        }
      });
      aw.port.onmessage = async e => {
        console.log(e.data);
        if (e.data === "ended") {
          track.stop();
          track.enabled = false;
          await ac.suspend();
        }
      };

      aw.connect(msd);
      //
      const ab = await ac.decodeAudioData(uint8array.buffer);
      let [channel0, channel1] = [ab.getChannelData(0), ab.getChannelData(1)];
      aw.port.postMessage({channel0, channel1}, [channel0.buffer, channel1.buffer]);

class AudioDataWorkletStream extends AudioWorkletProcessor {
  constructor(options) {
    super();
    if (options.processorOptions) {
      Object.assign(this, options.processorOptions);
    }
    this.i = 0;
    this.resolve = void 0;
    this.promise = new Promise(resolve => this.resolve = resolve)
                   .then(_ => this.port.postMessage("ended"));
    this.port.onmessage = e => {
      const {
        channel0, channel1
      } = e.data;
      ++this.i;
      for (let i = 0; i < channel0.length; i += 128) {
        this.buffers.channel0.push(channel0.subarray(i, i + 128));
      }
      for (let i = 0; i < channel1.length; i += 128) {
        this.buffers.channel1.push(channel1.subarray(i, i + 128));
      }
    }
  }
  process(inputs, outputs) {

    if (this.i > 0 && this.buffers.channel0.length === 0) {
      this.resolve();
      globalThis.console.log(currentTime, currentFrame, this.buffers);
      return false;
    }
    for (let channel = 0; channel < outputs.length; ++channel) {
      const [outputChannel] = outputs[channel];
      let [inputChannel] = inputs[channel];
      if (this.i > 0 && this.buffers.channel0.length) {
        inputChannel = this.buffers[`channel${channel}`].shift();
      }
      outputChannel.set(inputChannel);
    }
    return true;
  }
}
registerProcessor("audio-data-worklet-stream", AudioDataWorkletStream);

However, the "glitches" or "gaps" between the merged files is still perceptible in some instances.

It should be possible to set AudioWorklet data from an input media resource, for example a ReadableStream - without having to split and merge files or use MediaSource or HTMLMediaElement - to playback media without first reading the entire file, rather, to playback the media while read the input, in parallel.

How will using WebCodecs with Web Audio API improve playback quality when the requirement is to use disparate and arbitrary media sources into a single, or multiple inputs to Web Audio API?

How is WebCodecs concretely conceptualized as being capable able to resolve the issue: to process partial media file input, arbitrary selection of parts of media, or a potentially infinite input stream via WebCodecs; where is the connection made to Web Audio API?

Has OfflineAudioContext rendering output been considered as an input to WebCodecs?

See https://github.com/WebAudio/web-audio-api/issues/2138

@JohnWeisz The AudioBuffer model itself is an issue relevant to memory usage. A 3.5MB Opus file split and merged into a single file consumes at least 300MB of memory streaming the channel data to AudioWorklet from main thread. Executing disconnect() and exposing garbage collection and using self.gc() does not change the result. The total amount of Float32Array values is 140MB. See this comment at Is there a way to stop Web Audio API decodeAudioData method memory leak?

This is a reasonable workaround. But I agree that this shouldn't be needed. The browser should be able to collect these without any help from you as a long as you drop the references to the source buffers and the audio buffers. – Raymond Toy Feb 4 '19 at 16:52

Using MediaSource drastically reduces memory usage to 50MB for the same original Opus file written to a WebM container.

Given the PCM processing model am not sure how the overall memory usage can be reduced at all. What can be improved is garbage collection once use of the audio buffers is no longer needed.

If Web Audio API connects to WebCodecs via MediaStreamTrack that could potentially reduce memory usage - as long as the pattern is a direct to AudioContext.destination - and not further processing of Float32Arrays. However, to date we have no design pattern concept of how WebCodecs will connect to Web Audio API to experiment, and verify input and output, thus this issue.

@guest271314

I totally agree AudioBuffer is not really an efficient model for audio data consumption, although I don't see a viable yet generally usable alternative either (that said, I'm working on a proof-of-concept streaming-based AudioBuffer alternative in my spare time, and the streaming of obvious formats is well supported, so there might be something in the near future I can help with hopefully).

Using MediaStream and MediaStreamTrack is a viable alternative for AudioBuffer usage which results in less memory usage that is observable. Exposing the media decoders and encoders that HTMLMediaElement uses (in Window and Worker scope) without using <audio> element could be substituted for MediaElementAudioSourceNode to avoid reliance on DOM. The only reason to use Web Audio API in such a case would be for piping the stream through to output an audio effect.

Since per the Explainer WebCodecs is focused on MediaStreamTracks

let decoded = await new AudioDecoder({codec: 'opus'});
let mediaStream = new MediaStream([decoded]); 
let webAudioConnection = new MediaStreamAudioSourceNode(ac, {mediaStream})

could be used to connect to Web Audio API, if necessary.

Am trying to gather how the specification authors who have suggested that WebCodecs will somehow solve issues posted at Web Audio API repositories are actually cconceptualizing a prospective design pattern. As yet have not located any documentation that can point to which provides evidence of any such concept, flow-chart, algorithms, or API connection architecture.

A proof-of-concept for a tentative WebCodecs-type connection to Web Audio API.

To preface, have tested the code at least hundreds of times locally serving a ~35MB WAV file (22.05 kHz, 2 channels) converted from Opus using

$ opusdec --rate 22050 --force-wav house--64kbs.opus output.wav

which can be downloaded from https://drive.google.com/file/d/19-28SYFjhHg_a5NqPG1GIqV_sy5iMQ5x/view. Have no way to test the code at other architectures or devices right now.

The goal of this code is to demonstrate how a WebCodecs-like API would connect to Web Audio API.

Specifically, the primary goal is to request any audio file having any codec and to parse (demux) and commence playback of that media resource during reading of the request - before the entire file has been downloaded - and for that audio playback to be "seamless" (still some work to do there, though "works" as a base example of the pattern).

The bulk of the work in this example is done in a Worker.

Kindly test the code at various devices, architectures to verify the output.

index.html

<!DOCTYPE html>
<html>
<head>
  <title>Stream audio from Worker to AudioWorklet</title>
</head>
<body>
  <audio autoplay controls></audio>
  <script>
    // AudioWorkletStream
    // Stream audio from Worker to AudioWorklet
    // guest271314 2-24-2020
    "use strict";
    if (gc) gc();
    const handleEvents = e => globalThis.console.log(e.type === "statechange" ? e.target.state : e.type);
    // TODO "seamless" playback; media fragment concatenation, processing; seekable streams
    const audio = document.querySelector("audio");
    // audio.ontimeupdate = e => console.log(audio.currentTime, ac.currentTime);
    const events = ["durationchange", "ended", "loadedmetadata"
    , "pause", "play", "playing", "suspend", "waiting"];
    for (let event of events) audio.addEventListener(event, handleEvents);

    class AudioWorkletStream {
      // options here can be exhaustive, e.g., providing means to connect multiple SharedWorkers
      // file handles, Blobs, ArrayBuffers, ReadableStream, WritableStream, HTMLMediaElement
      // for multiple audio, video, images, text tracks, et al., media stream input, output processing
      constructor({
        codecs = ["audio/wav", "wav22k2ch"]
        , urls = ["http://localhost:8000?wav=true"]
        , sampleRate = 44100
        , numberOfChannels = 2
        , latencyHint = "playback"
        , workletOptions = {}
      } = {}) {
        this.mediaStreamTrack = new Promise(async resolve => {
          gc();
          const ac = new AudioContext({
            sampleRate,
            numberOfChannels,
            latencyHint
          });
          await ac.suspend();
          ac.onstatechange = handleEvents;
          await ac.audioWorklet.addModule("audioWorklet.js");
          const aw = new AudioWorkletNode(ac, "audio-data-worklet-stream", workletOptions);
          const msd = new MediaStreamAudioDestinationNode(ac);
          const {
            stream
          } = msd;
          // MediaStreamTrack, kind "audio"
          const [track] = stream.getAudioTracks();
          // fulfill Promise with MediaStreamTrack
          resolve(track);
          // set enabled to false
          track.enabled = false;
          aw.connect(msd);
          // inactive event at Chromium 81
          // not longer defined in Media Capture and Streams specification
          // Firefox does not yet fully support AudioWorklet
          stream.oninactive = stream.onactive = handleEvents;
          // not dispatched at Chromium 81
          track.onmute = track.onunmute = track.onended = handleEvents;
          // transfer sources, here, e.g., file handle
          // ReadableStream, WritableStream (https://github.com/whatwg/streams/blob/master/transferable-streams-explainer.md)
          // etc.
          const worker = new Worker("worker.js", {
            type: "module"
          });
          worker.postMessage({
            port: aw.port,
            codecs,
            urls
          }, [aw.port]);
          worker.onmessage = async e => {
            // use suspend(), resume() to synchronize to degree possible
            // with currentTime of <audio> with MediaStream set as srcObject
            if (e.data.start) {
              track.enabled = true;
              await ac.resume();
            }
            if (e.data.ended) {
              track.stop();
              track.enabled = false;
              await ac.suspend();
              msd.disconnect();
              aw.disconnect();
              aw.port.close();
              worker.terminate();
              await ac.close();
              for (let event of events) audio.removeEventListener(event, handleEvents);
              if (gc) gc(); 
              // currentTime(s)    
              console.log({
                audio: audio.currentTime
              , currentTime: e.data.currentTime
              , currentFrame: e.data.currentFrame
              , minutes: Math.floor(e.data.currentTime / 60)
              , seconds: ((e.data.currentTime / 60) - Math.floor(e.data.currentTime / 60)) * 60
              });
            };
          };
        });
      };
    };

    (async() => {
      // set parameters as arrays for potential "infinite" input, output stream
      let workletStream = new AudioWorkletStream({
        // multiple codecs
        codecs: ["audio/wav", "wav22k2ch"]
          // multiple URL's
        , urls: ["http://localhost:8000?wav=true"]
        , sampleRate: 22050
        , numberOfChannels: 2
        , latencyHint: "playback"
        , workletOptions: {
            numberOfInputs: 2,
            numberOfOutputs: 2,
            channelCount: 2,
            processorOptions: {
              buffers: {
                channel0: [],
                channel1: []
              },
              i: 0,
              promise: void 0,
              resolve: void 0
            }
          }
      });
      let {mediaStreamTrack} = workletStream;
      let mediaStream = new MediaStream();
       // Chromium bug, https://bugs.chromium.org/p/chromium/issues/detail?id=1045832 
       // currentTime does not progress at HTMLMediaElement when addTrack() is called on a MediaStream 
       // set as srcObject on <audio>, <video> where no MediaStreamTrack (getAudioTracks() // []) is previously set
      mediaStream.addTrack(await mediaStreamTrack);
      audio.srcObject = mediaStream;
    })();
  </script>
</body>
</html>

worker.js

// AudioWorkletStream 
// Stream audio from Worker to AudioWorklet (POC)
// guest271314 2-24-2020
import {CODECS} from "./codecs.js";
if (gc) gc();
const delay = async ms => await new Promise(resolve => setTimeout(resolve, ms));
let port;
onmessage = async e => {
  "use strict";
  if (!port) {
    ([port] = e.ports);
    port.onmessage = event => postMessage(event.data);
  }
  let init = false;
  const {
    codecs: [mime, codec],
    urls: [url]
  } = e.data;
  const {
    default: processStream
  } = await
  import (CODECS.get(mime).get(codec));
  let next = [];
  // costs latency, not necessary
  let writes = 0;
  let bytesWritten = 0;
  let samplesLength = 0;
  let portTransfers = 0;
  // https://fetch-stream-audio.anthum.com/2mbps/house-41000hz-trim.wav
  const response = await fetch(url);
  const readable = response.body;
  const writable = new WritableStream({
    async write(value) {
      bytesWritten += value.length;
      // value (Uintt8Array) length is not guaranteed to be multiple of 2 for Uint16Array
      // store remainder of value in next array
      if (value.length % 2 !== 0 && next.length === 0) {
        next.push(...value.slice(value.length - 1));
        value = value.slice(0, value.length - 1);
      } else {
        const prev = [...next.splice(0, next.length), ...value];
        do {
          next.push(...prev.splice(-1));
        } while (prev.length % 2 !== 0);
        value = new Uint8Array(prev);
      }
      // The length is in bytes, but the array is 16 bits, so divide by 2.
      // let length;
      // length = (data[20] + data[21] * 0x10000) / 2; 
      // we do not need length here, we process input until no more, or infinity
      let data = new Uint16Array(value.buffer);
      if (!init) {
        init = true;
        data = data.subarray(22);
      }
      const {
        ch0, ch1
      } = processStream(data);
      do {
        const channel0 = new Float32Array(ch0.splice(0, 128));
        const channel1 = new Float32Array(ch1.splice(0, 128));
        samplesLength += channel0.length + channel1.length;
        port.postMessage({
          channel0, channel1
        }, [channel0.buffer, channel1.buffer]);
        ++portTransfers;
      } while (ch0.length);
      ++writes;
      // wait N ms to avoid choppy output during initial 30 seconds of playback
      // affects total value of writes
      await delay(350);
    }, close() {
      // with await delay(ms) {writes: 340, bytesWritten: 35491032, samplesWritten: 69321}
      // without await delay(ms) {writes: 72, bytesWritten: 35491032, samplesWritten: 69321}
      globalThis.console.log({
        writes // variable
        , bytesWritten // 35491032
        , samplesLength // variable (69320 +/-1)
        , portTransfers // valiable
      }); 
      if (gc) gc();
    }
  }, new CountQueuingStrategy({
    highWaterMark: 128
  }));
  await readable.pipeTo(writable, {
    preventCancel: true
  });

  globalThis.console.log("read/write done"); 
}

codecs.js

export const CODECS = new Map([["audio/wav", new Map([["wav22k2ch","./wav22k2ch.js"]])]]);
// $ opusdec --rate 22050 --force-wav house--64kbs.opus output.wav
// Decoding to 22050 Hz (2 channels)
// Encoded with libopus 1.1
// ENCODER=opusenc from opus-tools 0.1.9
// ENCODER_OPTIONS=--bitrate 64
// Decoding complete. 
//
// $ mediainfo output.wav
// General
// Complete name        : output.wav
// Format     : Wave
// File size  : 33.8 MiB
// Duration   : 6 min 42 s
// Overall bit rate mode: Constant
// Overall bit rate     : 706 kb/s
//
// Audio
// Format     : PCM
// Format settings      : Little / Signed
// Codec ID   : 1
// Duration   : 6 min 42 s
// Bit rate mode        : Constant
// Bit rate   : 705.6 kb/s
// Channel(s) : 2 channels
// Sampling rate        : 22.05 kHz
// Bit depth  : 16 bits
// Stream size: 33.8 MiB (100%)

wav22k2ch.js

// https://stackoverflow.com/a/35248852
export default function int16ToFloat32(inputArray) {
    let ch0 = [];
    let ch1 = [];
    for (let i = 0; i < inputArray.length; i++) {
      const int = inputArray[i];
      // If the high bit is on, then it is a negative number, and actually counts backwards.
      const float = (int >= 0x8000) ? -(0x10000 - int) / 0x8000 : int / 0x7FFF;
      // toggle setting data to channels 0, 1
      if (i % 2 === 0) {
        ch0.push(float);
      } else {
        ch1.push(float);
      }
    };
    return {
      ch0, ch1
    };
  }

audioWorklet.js

class AudioDataWorkletStream extends AudioWorkletProcessor {
  constructor(options) {
    super(options);
    if (gc) {
      gc();
    };
    if (options.processorOptions) {
      Object.assign(this, options.processorOptions);
    }; 
    this.port.onmessage = this.appendBuffers.bind(this);
  }
  appendBuffers({
    data: {
      channel0, channel1
    }
  }) {
    this.buffers.channel0.push(channel0);
    this.buffers.channel1.push(channel1);
    ++this.i;
    if (this.i === 1) {
      this.port.postMessage({"start":true});
      globalThis.console.log({
        currentTime, currentFrame, buffers: this.buffers
      });
    };
  }
  endOfStream() {
    this.port
    .postMessage({
      "ended": true,
      currentTime,
      currentFrame
    });
    globalThis.console.log({
      currentTime, currentFrame, sampleRate, buffers: this.buffers
    });
    if (gc) gc();
  }
  process(inputs, outputs) {
    if (this.i > 0 && this.buffers.channel0.length === 0 && this.buffers.channel1.length === 0) { 
      return false;
    }
    for (let channel = 0; channel < outputs.length; ++channel) {
      const [outputChannel] = outputs[channel];
      let inputChannel;
      if (this.i > 0 && this.buffers.channel0.length > 0) {
        inputChannel = this.buffers[`channel${channel}`].shift();
      } else {
        if (this.i > 0 && this.buffers.channel1.length > 0 && this.buffers.channel0.length === 0) {
          // handle channel0.length === 0, channel1.length > 0
          inputChannel = this.buffers.channel1.shift();
          // end of stream
          this.endOfStream();
        };
      };
      outputChannel.set(inputChannel);
    };
    return true;
  };
};
registerProcessor("audio-data-worklet-stream", AudioDataWorkletStream);

Theoretically, since the pattern is initiated in this case by fetch() network request to response Body ReadableStream to WritableStream to AudioWorklet which outputs to a MediaStreamTrack, and given cache being disabled at the browser, there should not be any impact on memory, as no content is stored, particularly after the media has completed playback (unless the stream is infinite), meaning there is no reason for any of the API's used to retain the used data in memory for any purpose.

The same procedure should be possible for input 44100 sample rate, or any other codec. Apply the parsing algorithm in main thread, Worker, SharedWorker, module script, etc., then transfer Float32Arrays to AudioWorklet, which avoids decodeAudioData(), AudioBuffer, and AudioBufferSourceNode altogether.

Was able to parse hexadecimal encoded PCM within Matroska file output by MediaRecorder at Chromium using mkvparse.

Next will attempt to parse Opus, both in a WebM and OGG containter, without using WASM, using only the API's shipped with the browser.

Demonstration with 277MB WAV file https://plnkr.co/edit/nECtUZ?p=preview.

Virtual F2F:

Web Codec is low level and flexible and can encode PCM data.
Getting PCM in or out of an AudioContext is done via AudioWorkletNode, and a ring buffer using SharedArrayBuffer or postMessage. This is then fed to a AudioEncoder.

Web Codec is low level and flexible and can encode PCM data.

AFAICT Web Codec is not specified or implemented. The issues closed reference WebCodecs somehow being capable of connecting to WebAudio API. However, no such evidence exists that is or will be the case. Until then, this issue should remain open.

@padenot

Getting PCM in or out of an AudioContext is done via AudioWorkletNode, and a ring buffer using SharedArrayBuffer or postMessage. This is then fed to a AudioEncoder.

postMessage() option is not viable in practice. Perhaps in theory that works, but not in production code.

As have already proven in several issues and code examples using postMessage() for a substantial amount of data input will inevitably result in gaps in playback at Firefox and Chrome, Chromium browsers.

WebAudio / web-audio-api-v2

[meta] WebCodecs: Clearly define how Web Audio API input will connect to WebCodecs output #61