WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 166 forks source link

Raw audio recording not supported #2391

Closed goldwaving closed 10 months ago

goldwaving commented 3 years ago

I am creating an audio editor app. To allow editing of newly recorded audio, raw audio needs to be obtained. Unless I am missing something, this basic functionality seems to be missing from the specification. Here is what I have found so far:

  1. ScriptProcessorNode is deprecated and should not be used.
  2. MediaRecorder does not support raw audio.
  3. AnalyserNode does not provide seamless sequential data (it has gaps or overlaps).
  4. AudioWeblet is not supported on Safari and is needlessly complicated for such a simple task.

Is there another option I have not discovered yet? If not, could an AudioPassThroughNode be considered. It would have an ondata event that provides seamless, raw audio data passing through the node, perhaps with a starting time and an ending time and other details. Alternatively requiring support for raw audio in MediaRecord would work.

rtoy commented 3 years ago

The ScriptProcessorNode is deprecated, but I personally do not expect it to be removed any time soon. It will likely continue to work in all browsers for a very long time.

It's also unlikely WebAudio wil fix this since MediaRecorder exists. If you want raw audio you should file an issue on the MediaRecorder spec, and/or with the browsers to support raw audio in some form. (I personally would like it if supported some kind lossless mode, compressed or not.)

goldwaving commented 3 years ago

Thanks for the reply. I'll use ScriptProcessorNode for now. I would still like to see an AudioPassThroughNode, which should be very easy to implement or maybe even just add a timestamp to the AnalyserNode so we can know the exact time frame of the samples. (AudioContext may not provide a sample accurate time).

rtoy commented 3 years ago

What is an AudioPassThroughNode and what does it do? You can file a feature request for this node, if you like.

We could add a timestamp to the AnalyserNode. Most likely this would tell you the context time of the first sample in the time domain data.

guest271314 commented 3 years ago

Re an "AudioPassThroughNode" AudioWorklet can be used, see https://github.com/web-platform-tests/wpt/blob/d5be80a86d4f938250c075ac12414ad47516969c/webaudio/js/worklet-recorder.js, https://github.com/GoogleChromeLabs/web-audio-samples/blob/master/audio-worklet/basic/hello-audio-worklet/bypass-processor.js for examples.

Alternatively, on Chrome and Chromium latest builds a MediaStreamTrackProcessor can be used, something like

const processor = new MediaStreamTrackProcessor(track); // track is a MediaStreamTrack
const {readable} = processor;
readable.pipeTo(new WritableStream(
  write(value, controller) {
    // do stuff with AudioFrame that has buffer: AudioBuffer attribute
  }
));

See https://github.com/guest271314/webtransport/blob/main/webTransportBreakoutBox.js for an an example of streaming a WAV file from STDOUT to the browser

A minimal, complete, working example using AudioWorklet as a passthrough node to get get inputs, which are transferred to main thread using Transferable Streams https://github.com/microphone-stream/microphone-stream/pull/54/commits/8660971284cdcc950c48a5e12c1ba4d3e4db1567.

goldwaving commented 3 years ago

An AudioWorklet or any kind of processor node is too extreme for such a simple requirement (and it is not supported in Safari anyway).

An AudioPassThorughNode would be a very basic node that allows data to pass through it unmodified. It would simply have an event that provides AudioBuffers sequentially (gapless/seamless). This would allow an app to extract the raw audio data from any point within the graph.

guest271314 commented 3 years ago

An AudioWorklet or any kind of processor node is too extreme for such a simple requirement

I agree https://github.com/WebAudio/web-audio-api-v2/issues/97. That is what we have to work with right now.

Should that issue be re-opened where you can place your use-cases there and close this issue?

(and it is not supported in Safari anyway).

The specification cannot compel implementation by a private concern.

An AudioPassThorughNode would be a very basic node that allows data to pass through it unmodified.

I am not sure how basic the specification and implementation can be in practice with regard to different inputs.

PCM ('s16le '), 1 channel 22050 sample rate and 2 channel 44100 sample rate could probably be specified.

It would simply have an event that provides AudioBuffers sequentially (gapless/seamless). This would allow an app to extract the raw audio data from any point within the graph.

I am not sure what codecs Safari supports, for example if Opus is supported. Chrome supports MediaRecorder with x-matroska;codecs=pcm codec, which can be extracted from the EBML in Matroska container, see https://github.com/WebAudio/web-audio-api-v2/issues/63, https://github.com/legokichi/ts-ebml/issues/26. If Safari supports Native Messaging you can use native applications to communicate with the browser, reading the recording or live-stream in parallel https://github.com/guest271314/captureSystemAudio#stream-file-being-written-at-local-filesystem-to-mediasource-capture-as-mediastream-record-with-mediarecorder-in-real-time, which could be considered "too extreme" and expensive (https://github.com/fivedots/storage-foundation-api-explainer/issues/4#issuecomment-774548620; https://bugs.chromium.org/p/chromium/issues/detail?id=1181783#c7), however, when no API exists to achieve the requirement, the requirement must still be achieved nonetheless, by any means necessary.

The individuals working on Media Stream in Apple system could probably be helpful in realizing your feature request, see https://w3c.github.io/mediacapture-automation/. https://bugs.chromium.org/p/chromium/issues/detail?id=1151308, and https://w3c.github.io/mediacapture-main/.

The "gapless/seamless" requirement can be achieved using WebRTC RTCRtpSend.replaceTrack() to "seamlessly" replace the current MediaStreamTrack, see https://github.com/guest271314/MediaFragmentRecorder/tree/webrtc-replacetrack, also https://github.com/w3c/mediacapture-record/issues/167.

Perhaps you can, in addition to this issue, file a Safari tracking issue, if none already exist, to support Media Capture Transform API https://github.com/w3c/mediacapture-transform and, which does read the input from a MediaStreamTrack on Chrome and Chromium right now, and also WebRTC Encoded Media https://github.com/w3c/webrtc-encoded-transform.

guest271314 commented 3 years ago

It's also unlikely WebAudio wil fix this since MediaRecorder exists. If you want raw audio you should file an issue on the MediaRecorder spec, and/or with the browsers to support raw audio in some form. (I personally would like it if supported some kind lossless mode, compressed or not.)

https://github.com/w3c/mediacapture-record/issues/198

guest271314 commented 3 years ago

I am creating an audio editor app. To allow editing of newly recorded audio, raw audio needs to be obtained. Unless I am missing something, this basic functionality seems to be missing from the specification.

Since the audio is already recorded BaseAudioContext.decodeAudioData() can be used to get the file as a single AudioBuffer which can be sliced into discrete parts from the underlying Float32Array(s), values changed, then one or more new AudioBuffers can be created with createBuffer(), then if necessary a WAV file can be created from the resulting buffers https://github.com/guest271314/audioInputToWav.

guest271314 commented 3 years ago

An example of the approach described at https://github.com/WebAudio/web-audio-api-v2/issues/118#issuecomment-808824565 recorded an audio stream with MediaRecorder, including 1 second break of silence between audible output, then uploaded the file with File API using HTML <input type="file">, get AudioBuffer using BaseAudioContext.decodeAudioData(), create an AudioBuffer to set specific offset or time slice of data, here the original output is "one" (1 second of silence) "two", the modified output at second AudioBuffer is "two" "one" (1 second of silence).

<!DOCTYPE html>

<html>
  <head> </head>

  <body>
    <input type="file" accept="audio/*" />
    <script>
      const input = document.querySelector('input[type=file]');
      input.onchange = async ({
        target: {
          files: [file],
        },
      }) => {
        const ac = new AudioContext();
        ac.onstatechange = (e) => console.log(e);
        const ab = await ac.decodeAudioData(await file.arrayBuffer());

        let buffer = ac.createBuffer(
          ab.numberOfChannels,
          ab.length,
          ab.sampleRate
        );
        let source = ac.createBufferSource();
        source.buffer = ab;
        source.connect(ac.destination);
        source.onended = (e) => {
          source.disconnect();
          for (let index = 0; index < ab.numberOfChannels; index++) {
            const channel = ab.getChannelData(index);
            const a = channel.subarray(channel.length / 2, channel.length);
            const b = channel.subarray(0, channel.length / 2);
            buffer.getChannelData(index).set(a, 0);
            buffer.getChannelData(index).set(b, channel.length / 2);
          }
          source = ac.createBufferSource();
          source.buffer = buffer;
          source.connect(ac.destination);
          source.onended = async (e) => {
            console.log(e);
            await ac.close();
            input.value = null;
          };
          source.start();
        };
        source.start();
      };
    </script>
  </body>
</html>

one_two_webm.zip

guest271314 commented 3 years ago

We should be able to dynamically write a WAV file or PCM using existing AudioBuffer, or raw PCM as input using

      // https://github.com/chromium/chromium/blob/77578ccb4082ae20a9326d9e673225f1189ebb63/third_party/blink/web_tests/webaudio/resources/audio-file-utils.js
      // Utilities for creating a 16-bit PCM WAV file from an AudioBuffer
      // when using Chrome testRunner, and for downloading an AudioBuffer as
      // a float WAV file when running in a browser.
      function writeString(s, a, offset) {
        for (let i = 0; i < s.length; ++i) {
          a[offset + i] = s.charCodeAt(i);
        }
      }

      function writeInt16(n, a, offset) {
        n = Math.floor(n);

        let b1 = n & 255;
        let b2 = (n >> 8) & 255;

        a[offset + 0] = b1;
        a[offset + 1] = b2;
      }

      function writeInt32(n, a, offset) {
        n = Math.floor(n);
        let b1 = n & 255;
        let b2 = (n >> 8) & 255;
        let b3 = (n >> 16) & 255;
        let b4 = (n >> 24) & 255;

        a[offset + 0] = b1;
        a[offset + 1] = b2;
        a[offset + 2] = b3;
        a[offset + 3] = b4;
      }

      // Return the bits of the float as a 32-bit integer value.  This
      // produces the raw bits; no intepretation of the value is done.
      function floatBits(f) {
        let buf = new ArrayBuffer(4);
        new Float32Array(buf)[0] = f;
        let bits = new Uint32Array(buf)[0];
        // Return as a signed integer.
        return bits | 0;
      }

      function writeAudioBuffer(audioBuffer, a, offset, asFloat) {
        // let n = audioBuffer.length;
        let n = audioBuffer.reduce((a, b) => a + b.length, 0);
        // let channels = audioBuffer.numberOfChannels;
        let channels = audioBuffer.length;

        for (let i = 0; i < n; ++i) {
          for (let k = 0; k < channels; ++k) {
            // let buffer = audioBuffer.getChannelData(k);
            let buffer = audioBuffer[k];
            if (asFloat) {
              let sample = floatBits(buffer[i]);
              writeInt32(sample, a, offset);
              offset += 4;
            } else {
              let sample = buffer[i] * 32768.0;

              // Clip samples to the limitations of 16-bit.
              // If we don't do this then we'll get nasty wrap-around distortion.
              if (sample < -32768) sample = -32768;
              if (sample > 32767) sample = 32767;

              writeInt16(sample, a, offset);
              offset += 2;
            }
          }
        }
      }

      // See http://soundfile.sapp.org/doc/WaveFormat/ and
      // http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
      // for a quick introduction to the WAVE PCM format.
      function createWaveFileData(audioBuffer, asFloat, sampleRate) {
        let bytesPerSample = asFloat ? 4 : 2;
        let frameLength = audioBuffer[0].length; //audioBuffer.length;
        let numberOfChannels = audioBuffer.length; // audioBuffer.numberOfChannels;
        // let sampleRate = audioBuffer.sampleRate;
        let bitsPerSample = 8 * bytesPerSample;
        let byteRate = (sampleRate * numberOfChannels * bitsPerSample) / 8;
        let blockAlign = (numberOfChannels * bitsPerSample) / 8;
        let wavDataByteLength = frameLength * numberOfChannels * bytesPerSample;
        let headerByteLength = 44;
        let totalLength = headerByteLength + wavDataByteLength;
        let waveFileData = new Uint8Array(totalLength);
        let subChunk1Size = 16; // for linear PCM
        let subChunk2Size = wavDataByteLength;
        let chunkSize = 4 + (8 + subChunk1Size) + (8 + subChunk2Size);
        writeString('RIFF', waveFileData, 0);
        writeInt32(chunkSize, waveFileData, 4);
        writeString('WAVE', waveFileData, 8);
        writeString('fmt ', waveFileData, 12);
        writeInt32(subChunk1Size, waveFileData, 16); // SubChunk1Size (4)
        // The format tag value is 1 for integer PCM data and 3 for IEEE
        // float data.
        writeInt16(asFloat ? 3 : 1, waveFileData, 20); // AudioFormat (2)
        writeInt16(numberOfChannels, waveFileData, 22); // NumChannels (2)
        writeInt32(sampleRate, waveFileData, 24); // SampleRate (4)
        writeInt32(byteRate, waveFileData, 28); // ByteRate (4)
        writeInt16(blockAlign, waveFileData, 32); // BlockAlign (2)
        writeInt32(bitsPerSample, waveFileData, 34); // BitsPerSample (4)
        writeString('data', waveFileData, 36);
        writeInt32(subChunk2Size, waveFileData, 40); // SubChunk2Size (4)
        // Write actual audio data starting at offset 44.
        writeAudioBuffer(audioBuffer, waveFileData, 44, asFloat);

        return waveFileData;
      }

in any order we want, updating the relevant first 44 bytes during reading and writing.

I ran a sampling of 5 tests using the following input to a WAV encoder after noticing that the code took several seconds to complete, using performance.now() at the same location in the code

a. Passing BaseAudioContext.decodeAudioData() (AudioBuffer) to a function that calls createWaveFileData (where getChannelData() is called in loops)

b. Passing Array of Float32Array's from BaseAudioContext.decodeAudioData() (AudioBuffer) to a function that calls createWaveFileData() https://github.com/guest271314/audioInputToWav/blob/master/array-typedarray-with-audio-context.html (where getChannelData() is not called in loops)

c. Without using AudioContext or BaseAudioContext.decodeAudioData() (AudioBuffer), parsing an existing WAV file, passing Array of Float32Array's from to a function that calls https://github.com/guest271314/audioInputToWav/blob/master/pcm-wav-input-array-typedarray-without-audio-context.html (where getChannelData() is not called in loops)

Results:

a. 0.5837000000756234, 0.5977000000420958, 0.5663999998942018, 0.5826000000815839 (index):363 0.5800999999046326 (index):363 0.5641000000759959

b. 0.06749999988824129, 0.048599999863654375, 0.08129999996162951, 0.0941000001039356, 0.08059999998658895

c. 0.045799999963492155, 0.02969999983906746, 0.03619999997317791, 0.020699999993667006, 0.0328999999910593

guest271314 commented 3 years ago

I am creating an audio editor app. To allow editing of newly recorded audio, raw audio needs to be obtained. Unless I am missing something, this basic functionality seems to be missing from the specification.

  1. AnalyserNode does not provide seamless sequential data (it has gaps or overlaps).

It would simply have an event that provides AudioBuffers sequentially (gapless/seamless).

For editing newly recorded audio in the form of AudioBuffer you can create N AudioBuffer's, for example, similar to the output of MediaStreamTrackProcessor.readable.read() (220 length for 1 channel, 22050 sample rate, 440 length for 2 channel, 44100 sample rate, ~0.01 duration (0.009)), modify or edit the contents of the Float32Array's for the given time slice

        const ac = new AudioContext({ latencyHint: 0, sampleRate: 22050 });
        let ab = await ac.decodeAudioData(await file.arrayBuffer());
        let { sampleRate } = ab;
        const floats = ab.getChannelData(0);
        const buffers = [];
        for (let i = 0; i < floats.length; i += 220) {
          const _ab = ac.createBuffer(ab.numberOfChannels, 220, ab.sampleRate);
          _ab.getChannelData(0).set(floats.slice(i, i + 220)); // alternatively use subarray() here
          buffers.push(_ab);
        }

then create a single AudioBuffer for "sequentially (gapless/seamless)" playback,

        ab = new Float32Array(buffers.reduce((_a, _b) =>
          [..._a, ..._b.getChannelData(0)]
       , []));

further processing, or new file creation.

goldwaving commented 3 years ago

Thanks for the suggestions. MediaRecorder and decodeAudioData are not an option. Any encoding/decoding of the audio must be avoided for audio editing software. MediaRecorder does not support unencoded audio on most web browsers. Also live access to the audio data is required for peak meters and waveform drawing. decodeAudioData resamples to the AudioContext's sampling rate, which is also undesirable if getUserMedia uses a different sampling rate.

Maybe AudioPassThroughNode is not the best name. Perhaps AudioBufferNode or AudioDataEventNode would be better. Right now there is no node to passively access audio data passing though the graph in a seamless/gapless way.

For now I'll have to use ScriptProcessorNode until an alternative becomes more widely available.

guest271314 commented 3 years ago

MediaRecorder and decodeAudioData are not an option. Any encoding/decoding of the audio must be avoided for audio editing software. MediaRecorder does not support unencoded audio on most web browsers. Also live access to the audio data is required for peak meters and waveform drawing. decodeAudioData resamples to the AudioContext's sampling rate, which is also undesirable if getUserMedia uses a different sampling rate.

At OP the use case appaears to be editing recorded media, not live media.

If MediaRecorder is not or cannot be used, how do you record the audio in the first place?

decodeAudioData resamples to the AudioContext's sampling rate, which is also undesirable if getUserMedia uses a different sampling rate.

You can set the AudioContext sample rate. You can get the sampleRate from the original MediaStreamTrack an set that at AudioContext constructor.

Maybe AudioPassThroughNode is not the best name. Perhaps AudioBufferNode or AudioDataEventNode would be better. Right now there is no node to passively access audio data passing though the graph in a seamless/gapless way.

AudioPassThroughNode is feasible.

Yes, such a node does exist, AudioWorklet, just not implemented on the 1 browser you mentioned, implemented on Chromium, Chrome, Firefox, Nightly.

goldwaving commented 3 years ago

I should have stated in the OP that access to the live recorded data is required. Sorry about that.

If MediaRecorder is not or cannot be used, how do you record the audio in the first place?

That is the problem and why ScriptProcessorNode is the only solution at the moment.

Yes, AudioWorklet could be used on most browsers, but it has the following disadvantages:

  1. It requires developers to implement this simple functionality each time. I think a basic node like this should be part of Web Audio, rather than having to implement it.
  2. It adds complexity, requiring a separate js file, registration, etc.
  3. It runs on a different thread.
  4. It has more overhead when all we need is to copy the data and not modify it.
guest271314 commented 3 years ago
  1. Once the requirements of AudioWorklet are implemented and tested, the pattern becomes a template, with the ability to adjust the code to use new API's.
  2. That part of AudioWorklet specification can be improved, by providing a means other than Ecmascript Modules to create a AudioWorkletGlobalScopehttps://github.com/WebAudio/web-audio-api-v2/issues/109.
  3. This is not neceesarily an issue.
  4. What evidence can you produce to substantiate that claim?
goldwaving commented 3 years ago

Every new developer that needs to examine or copy the raw audio in a graph (there seems to be quite a few of them when I was searching for a solution) would have to re-implement this simple functionality as an AudioWorklet. It would be easier and better to have an dedicated node for this purpose. Instead of making developers search for how to do it using an AudioWorklet, they'd just create the dedicated node and use it.

Making developers write many lines of code for something this trivial is not developer friendly. Requiring developers to learn about AudioWorklet (and moving data between threads) for something this trivial also isn't. I can see many new developers struggling with this in the future. A dedicated node would have made things easier for me. I had to invest way too much time in figuring this out, then resort to a deprecated feature. Something like this should already be there.

AudioWorklets are great for many different things. This isn't one of them.

rtoy commented 3 years ago

So, you basically want an AudioPassThruNode to take its input and periodically fires an event that contains the audio data that has been received since the last event was fired?

That could be a lot of events, generating lots of garbage.

Introspection like this was never part of the original design. I can see it being useful for looking at various parts of the graph to see what is happening. Kind of like an oscilloscope probe at desired points in the graph.

bradisbell commented 3 years ago

That could be a lot of events, generating lots of garbage.

That "garbage" is exactly the audio data we're using for our applications. :-) Yes, that's a lot of events, and yes they are useful/required for many use cases. Sure, there's a lot of overhead, and yet doing audio is exactly what we need to do.

In my own usage of the Web Audio API, almost everything I've built requires the ScriptProcessorNode to capture data precisely because there is no other way to capture raw audio data. Even once AudioWorklet becomes viable, I think there will be a lot of overhead in shuffling buffers around to get that data back on the main thread/context. Even if MediaRecorder were to support WAV, it still only emits chunks when it feels like it rather than being locked to the audio processing time. (You can specify a time, but it can't be guaranteed as it is dependent on the container and such. And realistically, we don't always want a container. WAV container support would be great, but there are plenty of use cases where we just want raw PCM samples.)

rtoy commented 3 years ago

Basically, this is a synchronous version of an AnalyserNode, but only returns time data.

guest271314 commented 3 years ago

We do not want events. This is basically MediaStreamTrackProcessor where input is a MediaStreamTrack which is read using readable of the former at 0.001 duration, 440 for 44100 sample rate, 220 for 22050 sample rate, described at https://github.com/WebAudio/web-audio-api-v2/issues/118#issuecomment-808749936, specification https://w3c.github.io/mediacapture-transform/#track-processor. That is how I read OscillatorNode solely for timestamp of AudioFrame and substitute live-stream STDOUT from espeak-ng in Bash script as AudioBuffers, see the linked WebTransport example, which works as expected.

Even once AudioWorklet becomes viable, I think there will be a lot of overhead in shuffling buffers around to get that data back on the main thread/context.

The most effective approach I have found is to write all input data from multiple Uint8Array or other TypedArray's to a single WebAssembly.Memory instance and call grow() when necessary, to avoid fragmentation of ArrayBuffer's which produce glitches and gaps in playback, and handling Uint8Array input (from fetch() and other ReadableStream's) have odd length, which throws when passed to Int16Array or Uint16Array.

That can be implemented by Web Audio, however, do we really need duplicate features?

Feature request and tracking issues can also be filed to implementation at Safari; Firefox; et al.

guest271314 commented 3 years ago

Even once AudioWorklet becomes viable, I think there will be a lot of overhead in shuffling buffers around to get that data back on the main thread/context.

AudioWorklet is already viable https://github.com/guest271314/AudioWorkletStream. Just not supported on Safari.

By feature request, I mean feature request to implement Media Transform API https://github.com/w3c/mediacapture-transform at the browsers where you want this to work.

padenot commented 3 years ago

AudioWG call:

guest271314 commented 3 years ago

I filed a MediaRecorder specification issue to support WAV, linked in this issue. To get the raw PCM without WAV header we just need to read from index 44 forwards, make sure data at ondataavailable only emits even size Blob's representing the underlying data, so when we parse the data throughTypedArray's we we do not get error for oddlength`Uint8Array being passed to Uint16Array or Int16Array.

The WAV file header expects the complete file size within the first 44 bytes, without that set, or set to 0, HTMLMediaElement will not play the WAV file (I tried) we could write the length header only when we call stop(). There is currently no means to dynamically write headers with MediaRecorder API, the closest we have is start() or requestData().

See this issue at the MediaRecorder polyfill https://github.com/kbumsik/opus-media-recorder/issues/35

Because audio/wav is not designed for streaming, when mimeType is audio/wav, each dataavailabe events produces a complete and separated .wav file that cannot be concatenated together

See ai/audio-recorder-polyfill#7 (comment)

We can stream WAV, again, see AudioWorkletStream code that uses AudioWorklet. WebTransport recently removed support for quic-tansport protocol for singular HTTP/3 support https://groups.google.com/a/chromium.org/g/web-transport-dev/c/NPc_Q2anqrE/m/ZIKDyiusBAAJ, however, QuicTransport https://github.com/guest271314/webtransport/blob/main/quicTransportBreakoutBox.js is still implemented, where I stream STDOUT from a shell script (espeak-ng -m --stdout 'text' to work around Web Speech API severe deficiencies) which outpus a WAV file to the browser, where I parse the file and stream the result to audio output. I must reiterate here why WebAssembly.Memory.grow() is used here, to avoid fragmentation of input ArrayBuffer's that result in glitches and for the ability to process odd length Uint8Array's from multiple fetch() calls piped to a single WritableStream then to a single memory location.

WebCodecs is not production ready. The timestamp issue need to be resolved, that is, not used at all, as it is a poor API design that is not useful, I would not want users here to have to invest in the thousands of tests I have performed to reach that conclusion, and perhaps be advised such a conclusion is "off-topic" because it does not fit into a pre-disposed narrative of the WebCodecs API design conception; Issue 1195189: AudioEncoder should disregard AudioFrame timestamps between flushes https://bugs.chromium.org/p/chromium/issues/detail?id=1195189

The encoder should treat incoming frames as continuous sound signal and disregard timestamps even if it appears that there are gaps and overlaps in the audio signal.

an impediment rather than a benefit. AudioWorklet does not need a timestamp, just process the input sequentially.

Issue 1180325: Confirm BreakoutBox Audio/Video timestamps belong to same timebase https://bugs.chromium.org/p/chromium/issues/detail?id=1180325#c_ts1616959931

This is actually not possible in general, and we should make sure documentation reflects that.

MediaRecorder does not frequently merge Pull Requests. There needs to be some activity there to push https://github.com/w3c/mediacapture-record/issues/198 (and replaceTrack() https://github.com/w3c/mediacapture-record/pull/187 and replaceStream() https://github.com/w3c/mediacapture-record/pull/186 PR's, which are similar in scope and use-cases to this issue), and voice the use-cases and concerns raised here, there (W3C banned me from contributing to their repositpories, mailing lists, etc).

goldwaving commented 3 years ago

As someone that's been dealing with Wave files for over 25 years, my advice about adding RIFF Wave to MediaRecorder and potentially having malformed Wave files with zero length RIFF and 'data' chunks is: Don't do it! Also it is not safe to assume the Wave header is always 44 bytes. If the file contains 24 bit or multichannel audio, WAVE_FORMAT_EXTENSIBLE must be used, which has a completely different chunk size. Forcing developers to skip the Wave header just to get to the raw data is not a good idea. Trust me. Just give us the raw data, please.

I will reiterate that many developers (myself included) will continue to use ScriptProcessorNode because it does exactly what we need: we have some control over latency/block size, we get real-time raw data, and (very important) it is very easy to use (much easier than setting up an AudioWorklet).

If raw audio support was mandatory for MediaRecorder, that would help.

Good to know that Safari will eventually support AudioWorklet, but I'm still waiting for SharedArrayBuffer support. :)

guest271314 commented 3 years ago
  1. MediaRecorder does not support raw audio.

Technically it is possible to get raw audio using MediaRecorder by overlapping creation of a new instance every N seconds to make sure no data is lost, exclude duplicate overlapped data, then create an AudioBuffer for each file with BaseAudioContext.decodeAudioData() which should produce a stream of discrete AudioBuffers that can be analyzed, edited in succession to produce gapless media output.

guest271314 commented 3 years ago

An example rough draft for the approach described at https://github.com/WebAudio/web-audio-api-v2/issues/118#issuecomment-812876382

<!DOCTYPE html>

<html>
  <head> </head>

  <body>
    <audio
      controls
      crossorigin
      src="https://cdn.glitch.com/f92b40ba-41b8-4076-a8c7-f66c1ccfd371%2Fmusic.wav?v=1616487361153"
    ></audio>
    <script>
      const audio = document.querySelector('audio');
      const ac = new AudioContext({ sampleRate: 44100, latencyHint: 0 });
      ac.addEventListener('data', ({ detail }) => {
        console.log(detail);
      });
      audio.onloadedmetadata = async () => {
        audio.onloadedmetadata = null;
        const capture = audio.captureStream();
        const [track] = capture.getAudioTracks();
        if ('oninactive' in capture) {
          capture.oninactive = (e) => console.log(e);
        }
        track.onmute = track.onunmute = track.onended = (e) => console.log(e);
        console.log(capture, track, track.getSettings());
        const stream = async () => {
          let buffers = [];
          while (track.readyState === 'live') {
            await new Promise(async (resolve) => {
              const recorder = new MediaRecorder(capture, {
                audioBitrateMode: 'constant',
                audioBitsPerSecond: 128000,
              });
              // ++recorders;
              recorder.addEventListener(
                'dataavailable',
                async (e) => {
                  const buffer = await e.data.arrayBuffer();
                  if (buffer.byteLength) {
                    const ab = await ac.decodeAudioData(buffer);
                    const len = ab.length < 44100 ? ab.length : 44100;
                    const chunk = new AudioBuffer({
                      length: len,
                      sampleRate: ab.sampleRate,
                      numberOfChannels: ab.numberOfChannels,
                    });
                    for (let i = 0; i < ab.numberOfChannels; i++) {
                      const channel = ab.getChannelData(i);
                      chunk.getChannelData(i).set(channel.subarray(0, len));
                    }
                    ac.dispatchEvent(
                      new CustomEvent('data', { detail: chunk })
                    );
                    buffers = [...buffers, chunk];
                  }
                },
                { once: true }
              );
              recorder.start();
              if (audio.paused) {
                await audio.play();
              }
              setTimeout(() => {
                recorder.state === 'recording' && recorder.stop();
              }, 1200);
              setTimeout(() => {
                resolve();
              }, 1000);
            });
          }

          return new Promise((resolve) =>
            setTimeout(() => resolve(buffers), 201)
          );
        };
        const chunks = await stream();
        console.log(chunks);
        const [{ sampleRate, numberOfChannels }] = chunks;
        const floats = new Float32Array(
          chunks.reduce((a, b) => [...a, ...b.getChannelData(0)], [])
        );
        const buffer = new AudioBuffer({
          length: floats.length,
          sampleRate: sampleRate,
          numberOfChannels: numberOfChannels,
        });
        buffer.getChannelData(0).set(floats);
        const source = new AudioBufferSourceNode(ac, { buffer });
        const mediaElement = new Audio();
        mediaElement.controls = true;
        const msd = new MediaStreamAudioDestinationNode(ac);
        source.connect(msd);
        msd.stream.oninactive = (e) => console.log(e);
        document.body.appendChild(mediaElement);
        source.onended = () => msd.stream.getAudioTracks()[0].stop();
        mediaElement.srcObject = msd.stream;
        await mediaElement.play();
        source.start(ac.currentTime);
      };
    </script>
  </body>
</html>
emilfihlman commented 10 months ago

It's absurd that this still hasn't been fixed.

The amount of work that needs to be done just to get raw samples from the microphone is absurd, and very prone to introducing bugs and wasting everyone's time.

There is simply no reason why webaudio doesn't directly support getting raw sample chunks.

padenot commented 10 months ago

This has been possible for years, many web apps are doing it. AudioWorklet is the solution to access audio sample data on the web. If you don't care about performances at all, you can simply postMessage the input chunks and call it a day, interleaving, converting to e.g. S16 samples, and slapping a RIFF header on it.

https://ringbuf-js.netlify.app/example/audioworklet-to-worker/ is a full example that is suited for high-performance workloads, heavily commented, and does Web Audio API -> WAV file. It does so without touching the main thread, so it is robust against load and real-time safe.

ScriptProcessorNode is not a valid solution because there is no resilience against any kind of load when using it. It's trivial to make a page that drops some audio.

Web Codecs is also now available in Chromium and soon others, so sample-type conversion and encoding (to lossless and lossy audio formats) is supported. It's a few lines of code to (e.g.) get real-time microphone data, encode that in real-time to (e.g.) mp3 or aac or opus or flac and do something with it.

I'm closing this because the Web Audio API doesn't really deal with encoding: it's a processing API, and there are solutions already.

emilfihlman commented 10 months ago

Audioworklet is an abysmal "solution", and not simple at all.

I do not want any encoding, I want raw samples, and there is simply no reason why either mediarecorder can't be forced to add support for raw samples, or adding an analyser node to microphone input that gives out chunks on callback without dropping any of them.

The absurd thing is that sending raw samples has been made super easy with buffers, but getting raw samples is put behind absurd complexity, and even more absurd is that one can get time samples, but not with callbacks and guarantee that samples aren't dropped.

padenot commented 10 months ago

It's so absurdly complex that an entire example to get raw samples is 25 lines of code, with about 50% of the lines being boilerplate.

<script type="worklet">
  registerProcessor('test', class param extends AudioWorkletProcessor {
    constructor() { super(); }
    process(input, output, parameters) {
      this.port.postMessage(input[0]);
      return true;
    }
  });
</script>
<script type=module>
  var ac = new AudioContext;
  var worklet_src = document.querySelector("script[type=worklet]")
  const blob = new Blob([worklet_src.innerText],
                        {type: "application/javascript"});
  var url = URL.createObjectURL(blob);
  await ac.audioWorklet.addModule(url);
  var worklet = new AudioWorkletNode(ac, 'test', {});
  var osc = new OscillatorNode(ac);
  osc.start();
  osc.connect(worklet)
  worklet.connect(ac.destination);
  worklet.port.onmessage = (e) => {
    console.log(e.data[0]);
  }
</script>
emilfihlman commented 10 months ago

And the reason it couldn't be one line analyser.addEventListener("dataavailable", callback) is?

nickjillings commented 10 months ago

Callbacks on the main thread is a terrible idea for performance and you'll just lock up your app. You'll be getting more callbacks than you need, whilst the audio worklet provides you a thread just to process the audio information that you can process and then send back the information you are processing.

emilfihlman commented 10 months ago

That's based literally on nothing. At 48kS/s and 1024 samples per callback that's literally only 47 calls per second, much less than typical requestAnimationFrame, which runs at 60 fps usually, or even 120/144fps on modern phones, and usually does a lot more work than what analysing audio requires.

Patronizing or spreading fud is not ok.

padenot commented 10 months ago

And the reason it couldn't be one line analyser.addEventListener("dataavailable", callback) is?

Yes, unconditionally firing an event in an isosynchronous fashion to the main thread from a real-time audio thread with an expectation of real-time guarantees and no drops of audio buffers, with fixed buffering and without a way to handle main thread overload or any other form of control is simply bad design and doesn't work, in the same way that ScriptProcessorNode doesn't work: under ideal and controlled condition it's fine, but in the real world it isn't: main thread load (in our case, the characteristics that is important is its responsiveness) isn't something that can be controlled by the website developer in practice.

If the main thread isn't responsive for some time, you suddenly have a large number of events queued to its event loop. When it's processing events again, it now has to process all those events, loading the main thread again, delaying more events to be delivered, etc.

This is the same reason why requestAnimationFrame(...) isn't an event: avoiding problems piling up under load without a way to handle back-pressure.

In the case of the Web Audio API, developers can instead devise their own mechanism that suit their use-case better, using the lower-level primitive that is the AudioWorklet, or the higher level primitive that is the AnalyzerNode.

If it's for recording and not dropping buffers is important, use a worker and add a buffer there. If it's for visualization, maybe compute the values needed on the real-time audio thread and send the desired characteristics to the main thread, etc.

For that, it's possible to use message passing (postMessage(...)) that is easy to use but less efficient (because it can cause garbage collection and allocations), it's also possible to use real-time safe wait-free ring buffers based on atomics if the needs of the application is higher and its resilience is important.

Finally, AnalyzerNode suits the common case of needed some form of data for analysis, with some windowing but with the explicit non-guarantees about being able to get the entirety of the time-domain data without discontinuities or overlaps.

goldwaving commented 10 months ago

To follow up to my previous post and give some real world feedback...

Safari finally supports AudioWorklets, so I redesigned playback and recording in my app to use them. Unfortunately Safari had a major bug that caused distortion, but that has since been fixed. However audio playback on Safari is still very poor with frequent crackling and glitches when simply tapping the screen, even in the most basic app.

Using AudioWorklets requires more coding and has a steeper learning curve than should be required for such a simple task. It is really just a work-around to overcome the real problem.

The biggest design flaw is that AudioContext is tied to the main thread (which may explain Safari's poor quality). You have to create an AudioContext on the main thread. Playback has to be started, stopped, and managed on the main thread. AudioContext is tangled up in so many other things that prevent it from being available in Workers. If AudioContexts could be created in Workers, it would make this whole argument about ScriptProcessorNode and AudioWorklets irrelevant. There would be zero burden on the main thread.

One a separate note, decodeAudioData does not belong in AudioContext at all. Moving that functionality to the WebCodec API to actually handle containers would make far more sense. Web browser already have all that code to handle many different file types, but it is mostly wasted in the very limited decodeAudioData function.

padenot commented 10 months ago

Safari finally supports AudioWorklets, so I redesigned playback and recording in my app to use them. Unfortunately Safari had a major bug that caused distortion, but that has since been fixed. However audio playback on Safari is still very poor with frequent crackling and glitches when simply tapping the screen, even in the most basic app.

Using AudioWorklets requires more coding and has a steeper learning curve than should be required for such a simple task. It is really just a work-around to overcome the real problem.

The biggest design flaw is that AudioContext is tied to the main thread (which may explain Safari's poor quality). You have to create an AudioContext on the main thread. Playback has to be started, stopped, and managed on the main thread. AudioContext is tangled up in so many other things that prevent it from being available in Workers. If AudioContexts could be created in Workers, it would make this whole argument about ScriptProcessorNode and AudioWorklets irrelevant. There would be zero burden on the main thread.

Those are simply Safari bugs, that had nothing to do with the fact that you instantiate an AudioContext on the main thread. Case in point, the two other implementations are able to run very heavy workloads at low latency on the same host.

AudioWorkletProcessor's process method is very much the same as you do low-latency high-performance real-time audio in all desktop and mobile platform I know of (and I know all of them that aren't niche): a callback called on a real-time thread that provides input and requests output. Since it's not possible to create a Web Worker that has a real-time priority, being able to instantiate an AudioContext on a worker wouldn't change anything to the fact that ScriptProcessorNode isn't adequate. We're doing it anyway for other reasons (#2423). And even if it would be possible (something we're investigating), it's not OK to force a context switch just to do real-time audio.

One a separate note, decodeAudioData does not belong in AudioContext at all. Moving that functionality to the WebCodec API to actually handle containers would make far more sense. Web browser already have all that code to handle many different file types, but it is mostly wasted in the very limited decodeAudioData function.

decodeAudioData was shipped on the Web for too long before the cross-browser-vendor standardization effort started, and it's impossible to remove it, that would break too many websites. We've been able to remove the version that blocks the main thread until the entire file is decoded at least.

goldwaving commented 10 months ago

Those are simply Safari bugs

Agreed, but the difficulty Apple is having with playing defect free audio might suggest that WebAudio is more complicated than it should be.

Since it's not possible to create a Web Worker that has a real-time priority, being able to instantiate an AudioContext on a worker wouldn't change anything to the fact that ScriptProcessorNode isn't adequate

In real world apps AudioWorklets have to interact with Workers or the main thread where real-time priority is not available. Without careful implementation, you end up with the same problems as ScriptProcessorNode, but with even more overhead (as in your example above). AudioWorklets may be great for the small percentage of high end developers that need super low latency, real-time audio for a very specific use, but what about everyone else that just wants a simple way to get the audio data? ScriptProcessorNode with a larger buffer size was very adequate.

decodeAudioData was shipped on the Web for too long before the cross-browser-vendor standardization effort started, and it's impossible to remove it

Of course, but the lack of a better API after all of this time is disappointing and if ScriptProcessorNode can be deprecated, so can decodeAudioData (eventually). There is a lot of code locked behind that one function that could be the foundation for an container handling API that WebCodec lacks. However that probably should be discussed in a different topic.

SevenSystems commented 8 months ago

padenot

Thanks so much for posting this code. It's funny I just spent 1 hour googling how to accomplish THE most basic, fundamental task of an audio recording API -- RECORD RAW AUDIO 😂