Is defining fetch() in AudioWorkletGlobalScope an implementer choice?

WebAudio / web-audio-api-v2

The Web Audio API v2.0, developed by the W3C Audio WG

Other

121 stars 11 forks source link

Is defining fetch() in AudioWorkletGlobalScope an implementer choice? #73

Closed guest271314 closed 4 years ago

guest271314 commented 4 years ago

Describe the issue Is defining fetch() in AudioWorkletGlobalScope an implementer choice?

Where Is It AFAICT not addressed.

Additional Information fetch is not defined in AudioWorkletGlobalScope at Chromium. Why?

Is defining fetch within AudioWorkletGlobalScope at Firefox, Nightly within the parameters of the specification, or is fetch intentionally omitted from being defined in AudioWorkletGlobalScope?

karlt commented 4 years ago

AFAIK fetch() is defined only in WindowOrWorkerGlobalScope, and so I would not expect it to be exposed in AudioWorkletGlobalScope.

globalThis.fetch is undefined in Firefox Nightly. Do you mean something else?

guest271314 commented 4 years ago

@karlt

AFAIK fetch() is defined only in WindowOrWorkerGlobalScope, and so I would not expect it to be exposed in AudioWorkletGlobalScope.

Exposing, defining fetch() in AudioWorkletGlobalScope is consistent with defining ReadableStream, WritableStream, TransformStream and MessagePort in AudioWorkletGlobalScope. A Worklet is essentially one or more Worker's, performing tasks in prioritized; in parallel if required by the application; correct?

E.g., Tasklets https://github.com/GoogleChromeLabs/tasklets

WebWorkers can be expensive (e.g: ~5MB per thread in Chrome)

task-worklet https://github.com/developit/task-worklet (https://jsfiddle.net/developit/wfLsxgy0/); TaskQueue https://github.com/chromium/chromium/commit/1491a5e4656c41abe2adef1f74473afae7d6440c (https://github.com/web-platform-tests/wpt/issues/16153); <scheduler>.postTask() https://www.chromestatus.com/feature/6031161734201344.

See https://bugs.chromium.org/p/chromium/issues/detail?id=910471#c8

From the "transferable stream" perspective, Audio Worklet is just another form of worker.

The Module script (Worklet) must be fetched. Why should the Module script not have the capability to perform fetches? How is audio data expected to be provided to AudioWorklet?

Within the AudioWorkletProcessor data is expected to be input to process(), that input data should be capable of being fetched within AudioWorkletGlobalScope. That is a reasonable expectation, given that theoretically process() can process infinitely, therefore the input data can be varied; consider an infinite media stream where user can add their input to the queue. Currently, a Worker can be used for fetch() and that data transferred to the AudioWorkletGlobalScope. It should be possible to omit the Worker and perform all task to fetch and process input audio data in AudioWorkletGlobalScope

Is there any compelling reason for fetch to not be exposed in AudioWorkletGlobalScope?

globalThis.fetch is undefined in Firefox Nightly. Do you mean something else?

globalThis.fetch is a function at Nightly 76 and Firefox 74

Screenshot_2020-04-01_00-20-11 Screenshot_2020-04-01_00-18-05

guest271314 commented 4 years ago

One practical reason to define fetch in AudioWorkletGlobalScope is to reduce the number of objects that need to be transferred (cloned; serialized) to provide input to process(). From experiments at Chromium posting thousands of messages to MessagePort can affect Render Capacity, which reduced substantially when transferred one ReadableStream then read the input stream, in that case 291MB, in a method defined on AudioWorkletProcessor. The capability to fetch in AudioWorkletProcessor would eliminate the need to transfer from main thread, or Worker threads. There appears to be a cost for transferring objects that can accumulate over time and space. A logical solution is to avoid transferring objects, get the object from within the current AudioWorkletGlobalScope.

padenot commented 4 years ago

It is intentional to not have it, https://github.com/WebAudio/web-audio-api/issues/1439 is the first thing that pops up in my search engine when typing audioworklet fetch, and the reasonning is made clear there.

IO of any kind (including network IO) have no place on the real-time thread that services the process calls. And AudioWorklet has very little to do with any other kind of construct of the web platform, intentionally. It merely shares the script loading mechanism, by using the Worklet text.

Data is provided to an AudioWorklet via postMessage, potentially using transferable or SharedArrayBuffer, depending on the problem at hand.

There is ample documentation about how to do real-time audio processing in native, and AudioWorklet aims at allowing the web to do it. It is not a simple topic. On the web like in native most of the constructs available to programmer should not be used in the real-time thread.

guest271314 commented 4 years ago

Data is provided to an AudioWorklet via postMessage, potentially using transferable or SharedArrayBuffer, depending on the problem at hand.

Chromium does not appear to be able to handle thousands of postMessage() calls or handling message events. Render Capacity can reach 100%. For audio with a brief duration (minimal file size and transfers) postMessage() can work well. Transferring a ReadableStream solves that issue for 100+MB input. Firefox does not implement transferrable streams. fetch() in AudioWorklet would provide direct access to the data used for a potentially infinite stream. Have not dove into WebAssembly much, though perhaps can develop a way using WASM to implement fetch() equivalent in AudioWorklet to avoid the cost of transferring altogether.

padenot commented 4 years ago

No you can't do fetch in AudioWorkletGloablScope, regardless of the language.

In general is always possible to do bad things when writing code. If you wish to transfer 100MB of data into the browser, do a single postMessage with your data, it's efficient.

guest271314 commented 4 years ago

No you can't do fetch in AudioWorkletGloablScope, regardless of the language.

Have not tried, yet.

Should be possible using Native Messaging and Native File System at Chromium, even without trying to use WebAssembly, which is defined in AudioWorkletGlobalScope. Will experiment with the concept.

In general is always possible to do bad things when writing code.

"bad" is a subjective term, without any universally applicable definition or implementation equal for and to all in observable result for all parties.

If you wish to transfer 100MB of data into the browser, do a single postMessage with your data, it's efficient.

That takes time. If can begin playback with enough data to complete 1 second of process() (~346-384 Float32Arrays of 128 length), then begin the playback, while the remainding input data is being fetched and streamed in parallel.

padenot commented 4 years ago

Then use a ring buffer, this is widely documented in the literature.

https://en.wikipedia.org/wiki/Circular_buffer

guest271314 commented 4 years ago

@padenot Already got the code "working" (save for Linux/PulsAudio issue) by transferring the ReadableStream from a TransformStream from Worker to AudioWorkletProcessor with postMessage(). Used a 291 WAV file for testing because that was file having the greatest size that was able to location on the web that was served with Access-Control-Allow-Origin: * header. In theory the input stream from multiple fetch() calls or other forms of file and media input could be infinite.

The result essentially nullifies the argument that AudioWorkletGlobalScope cannot handle I/O and process audio in "real-time". The only difference from using fetch() in Worker rather than using fetch() in AudioWorkletGlobalScope is the need to create the Worker because fetch() is not defined in AudioWorkletGlobalScope. The rationale for not defining fetch() in AudioWorkletGlobalScope was theoretical, proven to be not an issue in AudioWorkletProcessor. Conversely, it is precisely when posting tens of thousands of TypedArrays to AudioWorkletProcessor using MessagePort is when Render Capacity reached levels that directly affect audio output, observable by glitches and gaps.

This issue is merely providing proof that not defining fetch() in AudioWorkletGlobalScope should be re-considered in lieu of reading a ReadableStream (e.g., from one or more Response.body) in that scope does not impact audio processing, whereas the presumptive solution, postMessage() does impact audio processing. Of course, individuals and entities become convinced in the efficacy of their theories, and can rely on them even in the face of evidence scientifically proving the contrary. Therefore, this, again, is only intended as an answer to the issue linked, that did not locate when searched before posting this issue. Nonetheless, the opinions therein are obsolete in the face of the results cited above. Will use Worker in the interim, as am doing now. Perhaps Firefox, when AudioWorklet is fully deployed, will have different output when thousands of objects are cloned/transferred, and will also implement transferrable streams https://bugs.chromium.org/p/chromium/issues/detail?id=894838, which is a very useful feature, particularly in this case https://bugs.chromium.org/p/chromium/issues/detail?id=910471.

Then use a ring buffer, this is widely documented in the literature.

Yes, read about that technical term. Am not certain the code wrote from scratch to achieve the requirement of processing an infinite input stream at AudioWorklet

    for await (const _ of (async function* stream() {
      while (true) {
        let { value, done } = await reader.read();
        if (done) {
          console.log(
            'readable close',
            currentTime,
            currentFrame,
            this.buffers.size,
            next.length,
            overflow.length
          );
          // handle overflow floats < 128 length
          if (overflow[0].length || overflow[1].length) {
            const channel0 = new Float32Array(overflow.splice(0, 1)[0]);
            const channel1 = new Float32Array(overflow.splice(0, 1)[0]);
            this.buffers.set(this.i, {
              channel0,
              channel1,
            });
            ++this.i;
          }
          return await reader.closed;
        }
        // value (Uint8Array) length is not guaranteed to be multiple of 2 for Uint16Array
        // store remainder of value in next array
        if (value.length % 2 !== 0 && next.length === 0) {
          next.push(...value.slice(value.length - 1));
          value = value.slice(0, value.length - 1);
        } else {
          const prev = [...next.splice(0, next.length), ...value];
          while (prev.length % 2 !== 0) {
            next.push(...prev.splice(-1));
          }
          value = new Uint8Array(prev);
        }
        // we do not need length here, we process input until no more, or infinity
        let data = new Uint16Array(value.buffer);
        if (!init) {
          init = true;
          data = data.subarray(22);
        }
        let { ch0, ch1 } = processStream(data);
        // send  128 sample frames to process()
        // to reduce, not entirely avoid, glitches
        sample: while (ch0.length && ch1.length) {
          let __ch0, __ch1;
          // last splice() not guaranteed to be length 128
          let _ch0 = ch0.splice(0, 128);
          let _ch1 = ch1.splice(0, 128);
          let [overflow0, overflow1] = overflow;
          if (_ch0.length < 128 || _ch1.length < 128) {
            overflow0.push(..._ch0);
            overflow1.push(..._ch1);
            break sample;
          }
          if (overflow0.length || overflow1.length) {
            __ch0 = overflow0.splice(0, overflow0.length);
            __ch1 = overflow1.splice(0, overflow1.length);
            while (__ch0.length < 128 && _ch0.length) {
              let [float] = _ch0.splice(0, 1);
              __ch0.push(float);
            }
            while (__ch1.length < 128 && _ch1.length) {
              let [float] = _ch1.splice(0, 1);
              __ch1.push(float);
            }
          }
          const channel0 = new Float32Array(__ch0 || _ch0);
          const channel1 = new Float32Array(__ch1 || _ch1);
          this.buffers.set(this.i, {
            channel0,
            channel1,
          });
          ++this.i;
          if (this.i === minSamples) {
            console.log('this.buffers.size:' + this.buffers.size);
            this.port.postMessage({
              start: true,
            });
          }
        }
        yield;
      }
    }).call(this));

could be termed as such by those that classify code this or that or otherwsie coin and pass along coding lingo, though achieves 128 length Float32Arrays to process() relatively consistently, save for the case of Disable cache being set at DevTools at Chromium, which is an unknown without first determining if PulseAudio on Linux not having sufficient "priority" needed to not observe glitches and "clicks".

guest271314 commented 4 years ago

If you wish to transfer 100MB of data into the browser, do a single postMessage with your data, it's efficient.

Ironically, am doing that to an appreciable degree by doing a single postMessage({readable}, [readbale]) from Worker to AudioWorkletProcessor at Chromium, in this case sample file with greatest size could locate on the Interwebs was 291 MB, however that feature is not implemented at Nightly or Firefox.

Why sould we need to create an entirely separate context, for example, a Worker, just to fetch data for process(). For a file that is less than 100 MB, that is possible. decodeAudioData() crashes the tab at Chromium after a certain unknown file size, that is for the referenced 291 MB. The remaining option is to use thousands of postMessage() calls to some other context. Am not certain how extra contexts and transferring tens to hundreds of thousands or millions of messages could be deemed more "efficient" than defining fetch() in the scope that will use the data and by doing so, avoid thousands of events where Event objects must be created aside from the data being used for audio?

rtoy commented 4 years ago

I think the guiding principle for AudioWorkletNode and process() is that it should focus only doing signal processing converting the input(s) (and parameters) into some appropriate ouput(s). postMessage is provided because we knew there would need to be occasional communication between the main thread and the audio thread.

Anything else was really beyond the scope of what AudioWorklet was intended to do.

I think if you want your worklet to process huge amounts of audio, you should use fetch on the main thread to get the data, create an AudioBufferSourceNode with the data, and connect that to one of the inputs of your AudioWorkletNode. Or maybe use a MediaStreamAudioSourceNode.

guest271314 commented 4 years ago

Was there never a consideration for an infinite stream as a use case for AudioWorklet?

For example, using AudioWorklet for a web radio station?

Were the original concept, and yet still, only adjusting audio signals of, for example, 10 or 30 seconds, data that decodeAudioData() can handle without crashing the tab, posted once via MessagePort, and that would be the extent of transferring, importing, or accessing data not originally posted to the AudioWorkletProcessor?

That is, are individuals outside of a given working group or other entity expected to stop conceiving of use cases for a given theory or prototype simply because the original technical writers limited their scope, and maintain such limitations even after releasing their creation to the wild, as a document?

The use case is beginning playback as soon as possible, of a potentially infinite input stream from various sources, with AudioWorklet process() being the sink. AudioBufferSourceNode requires an audio buffer, which is not feasible to get using decodeAudioData() due to that method having potential to crash the tab when file size is over a certain unknown maximum, and the issue with attempting to synchronize multiple AudioBufferSourceNode playback timelines.

MediaStreamAudioSource requires HTMLMediaElement or other API which creates a MediaStream. Ideally, the source of the MediaStreamTrack is the AudioWorkletProcessor, which can be provided raw input data from multiple sources; fetch(), file handles, directry handles, MediaRecorder where Blobs are written to ArrayBuffers, without using HTMLMediaElement.

In fact, AudioWorkletGlobalScope, AudioWorkletProcessor and process() can be used for non-audio purposes, if only for the clock, e.g., as video processor, multimedia and TTS/SST set of controllers where hundreds of worklets and workers can process input and output data, in parallel.

Am only relaying that defining fetch() in AudioWorkletGlobalScope would not impose any restrictions on the initially envisions, and current implementation, no more than thousands of postMessage()s. Again, that is proven by the code at a proof-of-concept to demonstrate how a connection to WebCodecs might look, where the ReadableStream did not affect Render Capacity, though thousands of messages did. That empirical result that is reproducible proves fetch() in an AudioWorklet will not affect signal processing, though the theory restricted by specification-authror imposed concerns, as implemented, can affect signal processing.

Part of the use case is to not necessarily rely on any single main thread, rather a radio station provided content from users by various means, which is placed in a queue. One or more SharedWorker, ServiceWorker, Worker and worklets can be utilized to gather input, which will be infinite.

Again, this was only to relay that using a ReadableStream (which Body implements), WritableStream do not have any observable impact on signal processing at AudioWorklet node. Created two versions of code, one with a single file (291 MB), one with multiple fetched files (70+ MB). There is no way to discern the difference in playback, save for Chromium bug which consistenly outputs glitches when the audio is played for the first time or Disable cache is checked.

Since there appears to be some objection to defining fetch(), am now considering the reasons therefore, as they are not immediately clear.

Am not sure if implementing fetch() in AudioWorkletGlobalScope is possible or not. Have not yet dove into experimentation because can transfer a ReadableStream in Chromium. From perspective here, it is rational to request that fetch() be officially defined by the authors of the specification, to avoid workarounds that may come about to achieve that use case. In general, when decide to dive in to any subject matter, create multiple versions, and test them at least until they break in obvious and non-obvious forms; break the implementation or theory down to the primary source, the lowest common and uncommon denominator and then sort out the broken and non-broken parts, with an attached map for others describing how to reproduce the result.

For clarity, the concerns about fetch() being defined in AudioWorkletGlobalScope are based on exactly what actual evidence?

guest271314 commented 4 years ago

I think the guiding principle for AudioWorkletNode and process() is that it should focus only doing signal processing converting the input(s) (and parameters) into some appropriate ouput(s). postMessage is provided because we knew there would need to be occasional communication between the main thread and the audio thread.

Anything else was really beyond the scope of what AudioWorklet was intended to do.

I think if you want your worklet to process huge amounts of audio, you should use fetch on the main thread to get the data, create an AudioBufferSourceNode with the data, and connect that to one of the inputs of your AudioWorkletNode. Or maybe use a MediaStreamAudioSourceNode.

Revisted using postMessage() to process data in "real-time" from a network request in Worker then transfer data from Worker to AudioWorklet port. Then ran code again that posts a single ReadableStream and processes data in a method in AudioWorkletProcessor which sets Float32Arrays in a Map. Then re-ran code using MediaElementAudioSourceNode, beginning playback at loadedmetadata and canplythrough. Some observations:

The file for the use case is 291 MB and can take up to 18 minutes to complete the network request, therefore waiting for the request to complete is not feasible where the requirement is beginning playback as soon as is programmatically possible. It is possible to begin playback with less than 512 bytes, while queuing the additional bytes processed to be set at process(). MediaSource does not provide a means to stream WAV files.

With latencyHint set to 1 which results in 8192 Callback Buffer Size, 346 calls to process() for first 1 second, posting 567,199 (2x 128 length Float32Arrays per message) messages at Chromium 83 and 567,753 messages at Nightly 77 to AudioWorkletProcessor.port at results in sample Render Capacities: 45%, 18%, 11%, 29%, 68% toggling to 1%. Gaps of silence in plyback occur at or around currentTime at any time, 800, 1300 currentTime forward.

When cache is disabled and single ReadableStream is transferred to AudioWorkletProcessor.port gaps can occur at or around currenTime 1200 forward. Sample Render Capacities: 30%, 8%, 9%, 28% toggling to 1%. When cache is enabled gaps in playback and a single ReadableStream is transferred, gaps in playback generally do not occur, though some glitches do occur, sample Render Capacities: 1.2%, 1.7%, 0.7%, 0.3%, 1.4%.

At Nightly 77 had to adjust the number of 2 channel 128 Float32Arrays to set at Map before playback from 64 to 1024 to store only enough input data before starting playback with resume() avoid process() being called exactly between last read of Map and onmessage event; and remove import from Worker which is not currently supported, gaps in playback are observable intermitently throughout playback, with a several second gap of silence (was not sure browser and OS did not freeze) in playback before readable close 0 2 was posted to console when ReadableStream in Worker read() done value set to true. Less gaps in playback and no glitches once ReadableStream in Worker completed reading and transferring data to the AudioWorkletProcessor

Both the multiple MessagePort transfers and single ReadableStream approaches use the same WAV to Float32Array processing code. The code that wrote could be part of the reason of the gaps and glitches in playback glitches. Yet, that does not explain how gaps and glicthes are possible when connect()ing to a MediaElementSourceNode and latencyHint is "interactive", or playback commences at autoplay of HTMLMediaElement. Gather audio processing without any glitches or gaps is non-trivial, without having to be concerned with I/O code.

When MediaElementSourceNode is connected to a bypass AudioWorkletNode and autoplay is set on the HTMLMediaElement gaps and glitches can occur at beginning of playback. When canplaythrough event is attached and fired then play() and AudioContext.resume() are executed sample Render Capacities: 2.7%, 3.8%, 1.9%, 4.45% with no apparent gaps or glitches in playback.

At Chromium, 82, 83, disabling cache affects playback. When cache is disabled, or the first time the data is process when cache is not disabled, gaps and glitches in playback can occur more frequently than not.

Will continue to try approaches to achieve consistent playback of a potenitally infinite input stream at AudioWorklet from multiple input sources, including network requests, using the same code - without any glitches or gaps, no matter the input method, OS, or browser.

AudioWorklet_MessagePort.zip

guest271314 commented 4 years ago

Nightly 77 crashes the browser when cache is disabled using the same code without cache disabled does not crash the browser.

guest271314 commented 4 years ago

Finally composed code which writes data from the ReadableStream of Response.body during an ongoing network request (in this case a single request of 291 MB, though can be extended to multiple requests) to shared memory, new WebAssembly({initial:..,maximum:..,shared:true}) which is posted to and read in AudioWorkletProcessor. The result is 27:27 of audio playback without gaps or glitches https://guest271314.github.io/AudioWorkletStream/.

One issue with using the same code at Chromium and Nightly is Nightly does not expose Content-Length header https://bugzilla.mozilla.org/show_bug.cgi?id=1654212, which is used to set maximum of the WebAssembly.Memory object. Otherwise both implementations - with applicable preferences set at Nightly re shared memory - output the same result.

guest271314 commented 4 years ago

FWIW 500K MessagePort.postMessage() calls can be used to output audio at AudioWorkletProcessor without gaps or glitches. Removed loops create Map of 128 length Float32Array, write directly to a single Uint8Array at each postMessage() event handler then read 512 bytes at increasing offsets from the Uint8Array. The important part is accumulation of, at the OS where tested the code, 344 * 512 * 1.5 of bytes, or in the range of 1.5 seconds of data for process() calls before messaging to main thread to execute resume() to begin process(), so that the ReadableStream is always "ahead" of process() by at least that amount of data initially, though the graph as to proximity between the offset being read and how much data is being stored changes during the stream, https://plnkr.co/edit/yh5A66UhMnlpq0JF?preview. Aside from the amount of data that, at least in testing so far, that is different between the version, which is negligible, have been able to create MessagePort.postMessage(), transferable stream (Chromium), and SharedArrayBuffer (using WebAssembly.Memory) versions which produce audio output from AudioWorklet without gaps or glitches at Linux. The is very useful technology, worth the effort of tens of thousands of tests.