chcunningham / wc-talk

MIT License
44 stars 5 forks source link

AudioEncoder, AudioDecoder, Serialize to JSON, Deserialize JSON to EncodedAudioChunk, Play Music #2

Closed guest271314 closed 2 years ago

chcunningham commented 2 years ago

Hi @guest271314 - this repo is just my private playground I'm using to to prepare for a talk with Paul. You may just want to host this PR in your own repo?

guest271314 commented 2 years ago

I already wrote the code and am hosting the code experimenting with WebCodecs. It would probably be helpful to your audience to disclose actual field usage results, including implementation restrictions, limitations. A fair amount of advertising and claims have been proffered re WebCodecs, I present my results for playing audio, serializing, deserializing audio from field in the wild. A short list of improvements that cannot be set aside as TODO are

If your talk involves the features advertised and claimed and worked on, it is fair for the listeners to be exposed to the shortcomings, else, you cannot improve the API if your system is closed-looped, without external verification from developers that have a stake in the product actually working as advertised, re "flexible", not a stake in advertising the product.

Simple close the PR if you are not interested in the field experiments I have performed. I will keep the PR open in the fork if you reconsider your decision, whatever that may be.

chcunningham commented 2 years ago

We've already filled our allotted talk time with demos, but please feel free to host this in your fork.

As for the issues, generally speaking, issues should be filed in either our WC github (for API shape) or crbug.com (for impl bugs).

  • Hardcoded 48000 kHz, 60ms, Opus

For samplerate, I've just filed https://github.com/w3c/webcodecs/issues/370 (TL;DR WC may resample) For 60ms, I've filed https://github.com/w3c/webcodecs/issues/371 (TL;DR WC should should offer more customization)

  • Concluding a AudioEncoder and AudioDecoder process on an error for "reclaiming" the code

Already tracked https://bugs.chromium.org/p/chromium/issues/detail?id=1242178

  • No way to stream from AudioDecoder directly to MediaStreamTrackGenerator due to incompatible callback output and expected input of MediaStreamTrackGenerator.writable - the generator is faster than the decoder

Sorry, I don't follow.

  • You do not get out from AudioDecoder what you input to AudioEncoder - when I get output from espeak-ng 'Test' as a 1 channel, 22050 WAV, it is currently impossible to get that same output back from AudioDecoder using 'opus' codec.

Duplicate of your first bullet. https://github.com/w3c/webcodecs/issues/370 (TL;DR WC may resample)

guest271314 commented 2 years ago

As for the issues, generally speaking, issues should be filed in either our WC github (for API shape) or crbug.com (for impl bugs).

WICG and W3C banned me https://github.com/w3c/mediacapture-screen-share/issues/141. I have no means to file specification issues. Screenshot_2021-09-29_17-52-19

  • No way to stream from AudioDecoder directly to MediaStreamTrackGenerator due to incompatible callback output and expected input of MediaStreamTrackGenerator.writable - the generator is faster than the decoder

Sorry, I don't follow.

Media Capture Transform MediaStreamTrackGenerator consumes WebCodecs AudioData. AudioDecoder.output is too far too slow (partially due to 60ms hardcoded Chromium implementation, partially due to the callback function) for AudioDecoder (e.g, input from serialized EncodedAudioChunk) => MediaStreamTrackGenerator => stream to peer or playback.

AudioDecoder.output should be compatible with MediaStreamTrackGenerator and MediaStreamTrackGenerator.

If your API does not really support

const config = {
    numberOfChannels: 1,
    sampleRate: 22050, // Chrome hardcodes to 48000
    codec: 'opus',
    bitrate: 16000,
  };

then AudioEncoder.isConfigSupported and AudioDecoder.isConfigSupported should not be fulfilling.

Again, I get 1 channel, 22050 WAV output from espeak-ng https://github.com/guest271314/native-messaging-espeak-ng/blob/master/AudioStream.js. I never set 48000 sample rate, Opus codec supports sample rates other than 48000.

(TL;DR WC may resample)

That is very problematic, and dissolves the concept of "flexibility" advertised. It does not matter what the implementation does intrernally, just don't tell me the API is flexible, and output what I expect, not Chromium-implementers and specification authors know-best, and that is it. I can just stream input to opusenc using Native Messaging, when I should not have to resample at all.

guest271314 commented 2 years ago

For example libopus only supports a handful of rates.

That is simply not true.

$ opusenc --raw-rate 22050 --raw-chan 1 input.wav output.opus

guest271314 commented 2 years ago
  • No way to stream from AudioDecoder directly to MediaStreamTrackGenerator due to incompatible callback output and expected input of MediaStreamTrackGenerator.writable - the generator is faster than the decoder

Sorry, I don't follow.

We know we can stream real-time using Opus codec becuase .opus files can be concatenated and played consecutively and becuase WebRTC streams Opus codec. AudioDecoder waits for an arbitrary frame size, 60ms, thus is not real-time. For real-time streaming from AudioDecoder resampling is necessary because I am expecting 22050 kHz, 1 channel, extactly what I input, to be output, which does not happen.

 const TARGET_FRAME_SIZE = 220;
  const TARGET_SAMPLE_RATE = 22050;
  // ...
  const config = {
    numberOfChannels: 1,
    sampleRate: 22050, // Chrome hardcodes to 48000
    codec: 'opus',
    bitrate: 16000,
  };
  encoder.configure(config);
  const decoder = new AudioDecoder({
    error(e) {
      console.error(e);
    },
    async output(frame) {
      ++chunk_length;
      const { duration, numberOfChannels, numberOfFrames, sampleRate } = frame;
      const size = frame.allocationSize({ planeIndex: 0 });
      const data = new ArrayBuffer(size);
      frame.copyTo(data, { planeIndex: 0 });
      const buffer = new AudioBuffer({
        length: numberOfFrames,
        numberOfChannels,
        sampleRate,
      });
      buffer.getChannelData(0).set(new Float32Array(data));
      // https://stackoverflow.com/a/27601521
      const oac = new OfflineAudioContext(
        buffer.numberOfChannels,
        buffer.duration * TARGET_SAMPLE_RATE,
        TARGET_SAMPLE_RATE
      );
      // Play it from the beginning.
      const source = new AudioBufferSourceNode(oac, {
        buffer,
      });
      oac.buffer = source;
      source.connect(oac.destination);
      source.start();
      const ab = (await oac.startRendering()).getChannelData(0);
      for (let i = 0; i < ab.length; i++) {
        if (channelData.length === TARGET_FRAME_SIZE) {
          const floats = new Float32Array(
            channelData.splice(0, TARGET_FRAME_SIZE)
          );
          decoderController.enqueue(floats);
        }
        channelData.push(ab[i]);
      }
      if (chunk_length === len) {
        if (channelData.length) {
          const floats = new Float32Array(TARGET_FRAME_SIZE);
          floats.set(channelData.splice(0, channelData.length));
          decoderController.enqueue(floats);
          decoderController.close();
          decoderResolve();
        }
      }
    },
  });

should not be necessary, though is, when the client wants back exactly what they input.

chcunningham commented 2 years ago

AudioDecoder waits for an arbitrary frame size, 60ms, thus is not real-time.

Got it. Fixing this is tracked in https://github.com/w3c/webcodecs/issues/371

For example libopus only supports a handful of rates.

That is simply not true.

$ opusenc --raw-rate 22050 --raw-chan 1 input.wav output.opus

The source line I linked rejects anything outside of the handful of supported rates. The library documentation matches the source code. "Sampling rate of input signal (Hz) This must be one of 8000, 12000, 16000, 24000, or 48000".

I can't tell from your command line what is actually happening. How did you determine that your encoding actually uses 22050? Could it be that this is just the input rate, and the opusenc opus-tool is actually resampling prior to feeding it to the opus library?

chcunningham commented 2 years ago

I can't tell from your command line what is actually happening. How did you determine that your encoding actually uses 22050? Could it be that this is just the input rate, and the opusenc opus-tool is actually resampling prior to feeding it to the opus library?

Nevermind. I see your comments on the chrome bug. https://bugs.chromium.org/p/chromium/issues/detail?id=1254496#c1

I'll investigate there.