WebAudio / web-audio-api-v2

The Web Audio API v2.0, developed by the W3C Audio WG
Other
121 stars 11 forks source link

No way to convert data from WebCodecs AudioData to AudioBuffer #133

Closed guest271314 closed 3 years ago

guest271314 commented 3 years ago

Describe the feature WebCodecs defines AudioData. In the WebCodecs specification this note appears:

NOTE: The Web Audio API currently uses f32-planar exclusively.

However, the format of AudioData from AudioDecoder is 'f32' not 'f32-planar'.

Even though sampleRate set at AudioDecoder configuration sampleRate is other than 48000 (and opusenc supports --raw-rate option to specifically set sample rate for Opus encoded audio) the resulting WebCodecs AudioData instance always has sampleRate set to 48000.

The effective result is that there is no way that I am aware of to convert the data from AudioData.copyTo(ArrayBuffer, {planeIndex: 0}) to an AudioBuffer instance that can be played with AudioBufferSourceNode or resampled to a different sampleRate, for example, 22050.

Since MediaStreamTrackGenerator suffers from "overflow" and no algorithm exists in the WebCodecs specification to handle the overflow outside of one defined by the user it is necessary for the user to write the algorithm. After testing a user might find a magic number to delay the next call to MediaStreamTrackGenerator.writable.WritableStreamDefaultWriter.write() https://plnkr.co/edit/clbdVbhaRhCKWmPS that approach does not achieve the same result when attempting to use a Web Audio API AudioBuffer and AudioSourceNode

async function main() {
  const oac = new AudioContext({
    sampleRate: 48000,
  });
  let channelData = [];
  const decoder = new AudioDecoder({
    error(e) {
      console.error(e);
    },
    async output(frame) {
      const { duration: d } = frame;
      const size = frame.allocationSize({ planeIndex: 0 });
      const data = new ArrayBuffer(size);
      frame.copyTo(data, { planeIndex: 0 });
      const view = new Float32Array(data);
      let i = 0;
      for (let i = 0; i < view.length; i++) {
        if (channelData.length === 220) {
          const floats = new Float32Array(220);
          floats.set(channelData.splice(0, 220));
          const ab = new AudioBuffer({
            sampleRate: 48000,
            length: floats.length,
            numberOfChannels: 1,
          });
          ab.getChannelData(0).set(floats);
          const source = new AudioBufferSourceNode(oac, { buffer: ab });
          source.connect(oac.destination);
          console.log(ab.duration, ab.sampleRate);
          source.start();
          await new Promise((r) => {
            console.log(ab);
            source.onended = r;
          });
        }
        channelData.push(view[i]);
      }
      if (decoder.decodeQueueSize === 0) {
        if (channelData.length) {
          const floats = new Float32Array(220);
          floats.set(channelData.splice(0, 220));
          const ab = new AudioBuffer({
            sampleRate: 48000,
            length: floats.length,
            numberOfChannels: 1,
          });
          ab.getChannelData(0).set(floats);
          console.log(ab.duration, ab.sampleRate);
          const source = new AudioBufferSourceNode(oac, { buffer: ab });
          source.connect(oac.destination);
          source.start();
          await new Promise((r) => (source.onended = r));
          await decoder.flush();
          return;
        }
      }
    },
  });

  const encoded = await (await fetch('./encoded.json')).json();
  let base_time = encoded[encoded.length - 1].timestamp;
  console.assert(encoded.length > 0, encoded.length);
  console.log(JSON.stringify(encoded, null, 2));
  const metadata = encoded.shift();
  console.log(encoded[encoded.length - 1].timestamp, base_time);
  metadata.decoderConfig.description = new Uint8Array(
    base64ToBytesArr(metadata.decoderConfig.description)
  ).buffer;
  console.log(await AudioEncoder.isConfigSupported(metadata.decoderConfig));
  decoder.configure(metadata.decoderConfig);
  while (encoded.length) {
    const chunk = encoded.shift();
    chunk.data = new Uint8Array(base64ToBytesArr(chunk.data)).buffer;
    const eac = new EncodedAudioChunk(chunk);
    decoder.decode(eac);
  }
}

verifying the AudioData data and AudioBuffer channel data are incompatible.

Is there a prototype? No.

Describe the feature in more detail

Web Audio API AudioBuffer <=> WebCodecs AudioData

Provide an algorithm and method to convert WebCodecs AudioData to Web Audio API AudioBuffer with option to set sample rate of the resulting object.

guest271314 commented 3 years ago

I used OfflineAudioContext to resample the hard-coded 48000 sample rate and numberOfFrames, 2568 for firs and 2880 for remainder of t AudioData objects output by AudioDeocder

https://chromium.googlesource.com/chromium/src/+/49cf62132c057a79b093c8b5ab72f195cac447cc/media/audio/audio_opus_encoder.cc#32

// For Opus, we try to encode 60ms, the maximum Opus buffer, for quality
// reasons.
constexpr int kOpusPreferredBufferDurationMs = 60;

https://chromium.googlesource.com/chromium/src/+/49cf62132c057a79b093c8b5ab72f195cac447cc/media/audio/audio_opus_encoder.cc#58

// default preferred 48 kHz. If the input sample rate is anything else, we'll
// use 48 kHz.

something like

  const TARGET_FRAME_SIZE = 220;
  const TARGET_SAMPLE_RATE = 22050;
  // ...
  const config = {
    numberOfChannels: 1,
    sampleRate: 22050, // Chrome hardcodes to 48000
    codec: 'opus',
    bitrate: 16000,
  };
  encoder.configure(config);
  const decoder = new AudioDecoder({
    error(e) {
      console.error(e);
    },
    async output(frame) {
      ++chunk_length;
      const { duration, numberOfChannels, numberOfFrames, sampleRate } = frame;
      const size = frame.allocationSize({ planeIndex: 0 });
      const data = new ArrayBuffer(size);
      frame.copyTo(data, { planeIndex: 0 });
      const buffer = new AudioBuffer({
        length: numberOfFrames,
        numberOfChannels,
        sampleRate,
      });
      buffer.getChannelData(0).set(new Float32Array(data));
      // https://stackoverflow.com/a/27601521
      const oac = new OfflineAudioContext(
        buffer.numberOfChannels,
        buffer.duration * TARGET_SAMPLE_RATE,
        TARGET_SAMPLE_RATE
      );
      // Play it from the beginning.
      const source = new AudioBufferSourceNode(oac, {
        buffer,
      });
      oac.buffer = source;
      source.connect(oac.destination);
      source.start();
      const ab = (await oac.startRendering()).getChannelData(0);
      for (let i = 0; i < ab.length; i++) {
        if (channelData.length === TARGET_FRAME_SIZE) {
          const floats = new Float32Array(
            channelData.splice(0, TARGET_FRAME_SIZE)
          );
          decoderController.enqueue(floats);
        }
        channelData.push(ab[i]);
      }
      if (chunk_length === len) {
        if (channelData.length) {
          const floats = new Float32Array(TARGET_FRAME_SIZE);
          floats.set(channelData.splice(0, channelData.length));
          decoderController.enqueue(floats);
          decoderController.close();
          decoderResolve();
        }
      }
    },
  });
guest271314 commented 3 years ago

The audio playback quality is sub-par when resampling from 48000 to 22050. What is the suggested procedure to produce quality audio without glitches, gaps, faster or slower rate frames when converting from WebCodecs AudioData to AudioBuffer for the purpose of breaking out of the hard-coded box of Chrome WebCodecs implementation?

webcodecs-serialize-to-json-deserialize-json.zip

padenot commented 3 years ago

The current design direction is to be able to create AudioBuffer objects directly from typed arrays, and to allow AudioBuffer to internally used more data types than f32. For now, authors need to create an AudioBuffer of the same size, use AudioData.copyTo to copy to an intermediate ArrayBuffer, and then copy (with possible conversion) to the AudioBuffer. This is wasteful and not ergonomic.

Another design direction is to be able to get the memory of an AudioData, and directly construct an AudioBuffer from this memory, skipping all copies (https://github.com/w3c/webcodecs/issues/287).

guest271314 commented 3 years ago

There are several issues.

In summary, there needs to be consistency between these burgeoning API's so that user-defined conversion is not necessary, those if the user decides to convert between AudioData and AudioBuffer "seamlessly"; WebCodecs has free reign to do whatever it wants - why would the decoder only output 48000 sample rate when I deliberately input 22050 sample rate, 1 channel to configuration? That is inviting user-defined conversion (issues).

guest271314 commented 3 years ago

I updated and tested the code using OfflineAudioContext a few hundred more times and compared to creating a WAV file using data from AudioData.copyTo()

// https://github.com/higuma/wav-audio-encoder-js
class WavAudioEncoder {
  constructor({ buffers, sampleRate, numberOfChannels }) {
    Object.assign(this, {
      buffers,
      sampleRate,
      numberOfChannels,
      numberOfSamples: 0,
      dataViews: [],
    });
  }
  setString(view, offset, str) {
    const len = str.length;
    for (let i = 0; i < len; i++) {
      view.setUint8(offset + i, str.charCodeAt(i));
    }
  }
  async encode() {
    const [{ length }] = this.buffers;
    const data = new DataView(
      new ArrayBuffer(length * this.numberOfChannels * 2)
    );
    let offset = 0;
    for (let i = 0; i < length; i++) {
      for (let ch = 0; ch < this.numberOfChannels; ch++) {
        let x = this.buffers[ch][i] * 0x7fff;
        data.setInt16(
          offset,
          x < 0 ? Math.max(x, -0x8000) : Math.min(x, 0x7fff),
          true
        );
        offset += 2;
      }
    }
    this.dataViews.push(data);
    this.numberOfSamples += length;
    const dataSize = this.numberOfChannels * this.numberOfSamples * 2;
    const view = new DataView(new ArrayBuffer(44));
    this.setString(view, 0, 'RIFF');
    view.setUint32(4, 36 + dataSize, true);
    this.setString(view, 8, 'WAVE');
    this.setString(view, 12, 'fmt ');
    view.setUint32(16, 16, true);
    view.setUint16(20, 1, true);
    view.setUint16(22, this.numberOfChannels, true);
    view.setUint32(24, this.sampleRate, true);
    view.setUint32(28, this.sampleRate * 4, true);
    view.setUint16(32, this.numberOfChannels * 2, true);
    view.setUint16(34, 16, true);
    this.setString(view, 36, 'data');
    view.setUint32(40, dataSize, true);
    this.dataViews.unshift(view);
    return new Blob(this.dataViews, { type: 'audio/wav' }).arrayBuffer();
  }
}
// ...
const wav = new WavAudioEncoder({
  sampleRate: 48000,
   numberOfChannels: 1,
   buffers: [new Float32Array(data)],
});
const ab = (await ac.decodeAudioData(await wav.encode())).getChannelData(0);

Glitches can occasionally occur in the beginning of the OfflineAudioContext playback. No glitches occur creating WAV headers and prepending the headers to the data. Test and compare the differences for yourself https://guest271314.github.io/webcodecs/.

Are these the simplest approaches to resample the output from AudioDecoder.decode()?

The important point is that it is only necessary to resample the data from AudioData at AudioDecoder.output becuase WebCodecs does not honor AudioEncoder or AudioDecoder configuration and resamples to 48000, and outputs numberOfFrames far greater than input numberOfFrames which is inconsistent behaviour.

If there was consistency between WebCodecs AudioEncoder.output and AudioDecoder.output with regard to AudioData there would be no need to resample with Web Audio API.

padenot commented 3 years ago

Two things:

guest271314 commented 3 years ago

The problem is resampling is necessary based on WebCodecs output.

All you need do is test the output of AudioDecoder and try to pass that AudioData directly to a MediaStreamTrackGenerator. One of two outcomes currently exist without user-defined intervention:

I can do $ opusenc --raw-rate 22050 input.wav output.opus and get the output I set. WebCodecs ignores the configuration, yet claims "flexibility". Since you are citing 48kHz as the inflexible default for WebCodecs implementation of 'opus' you need to update your specification to state that unambiguously so that I no longer will expect the option I pass to be effectual.

Resampling is necessary with the output of WebCodecs AudioDecoder AudioData to outerh API's - without using setTimeout() and essentially guessing when the incompatible-with AudioData will end.

I suggest you folks actually test AudioDecoder => MediaStreamTrackGenerator, and stop claiming WebCodecs is "flexible" is you intend on restricting options available using opusenc and opusdec. I might as well just use opusenc and opusdec with fetch() or WebTransport.