Multithreading - Githubissues

wytrych commented 4 years ago

Describe the feature Support multithreading to be able to better utilise CPU in complex Web Audio apps, like DAWs.

Is there a prototype? The very basic implementation is in Chrome already, where each AudioContext is run on a new thread. It could be enough to sanction this behaviour in a spec.

Describe the feature in more detail The minimal way for it to work is as already described in Chrome. Each new AudioContext() is run on a separate thread. When running multiple contexts in parallel we of course run into problems like unsynced clocks and the lack of a single sink so it's impossible to have one master gain for example. These could be overcome by including also algorithms that would utilise multiple cpus behind the scenes. That would be easier to use, but also a lot of work and maybe not necessary to unlock the potential of more CPU power available to developers.

guest271314 commented 4 years ago

When running multiple contexts in parallel we of course run into problems like unsynced clocks

Can you post code to reproduce that behaviour? AFAICT multiple AudioContext instances and AudioWorkletProcessor.currentTime all return the same value.

bradisbell commented 4 years ago

@guest271314 Is the AudioContext time not derived from the sample rate and actual playback rate of the audio device? That is, if I have an audio device configured to play at 48 kHz, but is actually playing at 48.001 kHz, that timing will drift slightly ahead from real time. I would assume that all timings related to the audio context would be specific to the destination node playback rate, but I don't know if that's actually the case. If it is the case, then separate audio contexts could drift from each other, if being output to different physical devices.

wytrych commented 4 years ago

@guest271314 on Firefox and Safari they do return the same value always since it's not using separate threads, but on Chrome they don't. Simple case to reproduce:

const ac1 = new AudioContext();
const ac2 = new AudioContext();
const delta = ac1 - ac2;

Now to check the drift:

let drift = ac1 - ac2 - delta;

Initially that value will be 0. But put the tab you're running it in the background, do something that will occupy the CPU a bit harder for some time, so the tab get's downprioritized and that value will slowly move away from 0.

@bradisbell The time is not connected in any way from sample rate. It's incremented during each render quantum. So if there is a glitch, ie. the render quantum will not be processed in time, the context time will fall behind.

guest271314 commented 4 years ago

In general, do not rely on any timings produced by a default function with a specified algorithm. By the time the value is read the audio process has already progressed, without even accounting for other code that can take priority over the thread, if only momentarily enough to cause inconsistency. Prefer to roll own "clocks" and threads.

There are edge cases, undocumented side effects, unintended consequences of different API's.

Consider Opus, which is 48kHz, yet opusenc has a --raw-rate option that can be set to 44100.

Issue 1006617: MediaSource audio output is 6 seconds faster than video output https://bugs.chromium.org/p/chromium/issues/detail?id=1006617.

There is no codified discription of corresponding audio definition of terms "glitches", "gaps", or "thumps" https://bugs.chromium.org/p/chromium/issues/detail?id=825823#c81.

latencyHint affects how many times process() is executed per second in AudioWorkletProcessor https://plnkr.co/edit/79GMVE?preview, yet that is not necessarily clear in the specification, from perspective here.

Cache disabled at Network tab at DevTools, or using fetch('/path/to/resource', {cache:'no-store'}), or running the code for the first time at Chromium or Firefox directly impacts glitches and gaps in audio output by AudioWorkletProcessor https://bugs.chromium.org/p/chromium/issues/detail?id=1063524 and portion of comment below

This is really weird. Could you file a separate bug with more details? Sample code would be great.

at https://bugs.chromium.org/p/chromium/issues/detail?id=910471#c19.

When reading a file in "real-time" using a ReadableStream from fetch(), particularly when cache is disabled, or cache:'no-store' option is set at the fetch function parameter it is possible for the ReadableStream to still have 200MB of data to read, yet process() can be called precisely at the time when the last read occurred and before the next read occurs, where that case needs to be handled by returning true until the next read of the stream. That issue will be multiplied with multiple threads. AudioWorkletProcessor cannot execute a method on the extended class and execute process() at the same time, whether one AudioContext or multiple AudioContext instances.

A single AudioWorkletProcessor.port cannot handle hundreds of thousands of transfers of Float32Arrays at either Chromium or Firefox without glitches. Transferring one ReadableStream from a TransformStream reduces glitches and gaps, to potentially, one. Have not yet found AudioWorklet to be consistent as to lack of glitches, gaps at Chromium or Firefox right now. The closest able to achieve is transferring a ReableStream, which theoretically, can be infinite; Firefox does not implement transferable streams.

When reading a File being written at local filesystem, for example, piping system audio capture through opusenc and, or ffmpeg and using FileSystemFileHandle.getFile() to read the file in "real-time" to pipe to a MediaSource, AudioWorkletProcessor, etc. it might be necessary to handle thousands of DOM DOMException: A requested file or directory could not be found at the time an operation was processed. exceptions before ffmpeg writes 669 bytes to the WebM container.

The above examples account for only a sampling of the different implementations of a given specification, features that are not necessarily clear, unintended consequences of prioritization of different API's, limitations of more than one process occurring at the same time.

Would suggest to build a prototype of the feature as a proof-of-concept. Test. Then test some more https://gist.githubusercontent.com/guest271314/1fcb5d90799c48fdf34c68dd7f74a304/raw/c06235ae8ebb1ae20f0edea5e8bdc119f0897d9a/UseItBreakItFileBugsRequestFeaturesTestTestTestTestToThePointItBreaks.txt.

guest271314 commented 4 years ago

In theory AudioContext or AudioWorkletGlobalScope and following AudioWorkletProcessor could have an extended design the prior art https://github.com/developit/task-worklet, https://github.com/developit/task-worklet; or PaintWorklet; or TaskWorklet https://github.com/web-platform-tests/wpt/issues/16153; or scheduler.postTask(); or the current implementation, i.e., when executed multiple Worklet threads are created to handle the same task. Consider ReadableStream.tee where in this case, per the algorithm, there should be no glitches or gaps because multiple "threads" are executing the same procedure at the "same time"; if one thread has a glitch the other threads do not; a fuzzy logic implementation of AudioContext; that is if the requirement is no glitches or gaps, and not multithreading with the same existing bugs carried over to multiple threads.

guest271314 commented 4 years ago

When running multiple contexts in parallel we of course run into problems like unsynced clocks and the lack of a single sink so it's impossible to have one master gain for example. These could be overcome by including also algorithms that would utilise multiple cpus behind the scenes.

If the requirement is to "normalize" a range of potential currentTimes from multiple threads a draft algorithm could look like

async function multithreadAudio() {
  return Promise.race(
    // no way to get all clocks at the same time
    // arguements read right to left https://github.com/tc39/ecma262/issues/1397
    [ac1, ac2, ...acN]
    // has acN currentTime progressed by the time ac1 is read, or vice versa?
    // match first, middle, last, average, random, bring your own algorithm
  )
 .then(sample => doStuffWithSample(sample))

How to verify the value is written at currentTime by acX and continue processing next input and output from selected context at the same time? Postive lookahead, negative lookbehind?

What is not possible now?

wytrych commented 4 years ago

@guest271314 I am not sure I understand fully your comments. They seem to cover a wide range of issues not necessarily directly connected with this feature proposal?

This feature proposal is quite simple - sanction what Chrome is doing already to be able to use more CPU for audio computation. This would allow for building more powerful audio web apps.

guest271314 commented 4 years ago

@guest271314 I am not sure I understand fully your comments. They seem to cover a wide range of issues not necessarily directly connected with this feature proposal?

There are existing issues, bugs, edge cases that could render use of more CPU moot if not fixed first.

For example, no matter how much CPU is used if it is not documented that disabling cache or using fetch() with cache:'no-store' set where the data therefefrom is used by AudioWorkletProcessor gaps and glitches will still occur.

This would allow for building more powerful audio web apps.

Am not the owner of this repository. Am a user of AudioContext asking pertinent questions and sharing results of experiments and tests which might not have been considered. If there are issues with the current CPU usage more CPU usage will not get rid of the issues, just provide more transitors to power existing bugs or unintended consequences of other API's.

Can you explain why you believe that to be the case?

How is "more powerful" actually defined?

What are you not able to achieve now that the proposed feature would make possible?

guest271314 commented 4 years ago

@wytrych Any post that make in any domain of human activity is evidence-based. A claim that X will improve Y is just a claim. The scientific method requires reproduction of the claim to be verified, not just by the claimant, but by individuals or institutions other than the claimant. The point of that procedure, no matter the field or domain, is to vet claims and make clear irrefetable facts, rather than relying on folklore, guessing, or concepts that have not been tested by others.

For example, the claim has been made several times in this respository and WebAudio v1 that WebCodecs proposal will somehow contribute to resolving issues filed in WebAudio repositories, and based on that premise several WebAudio API issues have been closed.

However, have not located any evidence that WebCodecs will solve any of the WebAudio API issues that were published in this or WebAudio API v1 repositories, https://github.com/WebAudio/web-audio-api-v2/issues/61. Thus the claim is not evidence-based.

Am interested in precisely what problem that you are trying to solve that you cannot solve now due to the proposal that you are making not being specified or otherwsie implemented, so that can reproduce what you are describing and concur or refute based on evidence in the field, rather than speculation in one's own laboratory. Can you provide a use case to substantiate?

wytrych commented 4 years ago

Well the use case is that with single threaded web audio, you can't utilise more than one cpu core. At least not in Chrome, Firefox I believe is doing some parallelisation across cores. The result of that is that you can run more Web Audio Nodes if you can utilise more cpu. More oscillators, filters, wave shapers, etc. Fixing other performance issues is desirable of course, but without utilising more CPU cores you can only go so far.

guest271314 commented 4 years ago

Am curious if you are able to determine what the current limitations are and if using more than one CPU with WebAudio API will actually change observable behaviour (https://pcpartpicker.com/forums/topic/104689-do-more-cpu-cores-help-with-chrome-tabs)?

guest271314 commented 4 years ago

Am not sure that simply providing more computing power to a problematic implementation would necessarily solve any issue relevant to limitations of language in a specification.

For example, at the same machine, with the requirement to capture every image of a video, Chromium takes considerably more time than Firefox to execute the same codehttps://bugs.chromium.org/p/chromium/issues/detail?id=1063681

For comparative analysis Nightly 76 takes 46.987 (seconds) to seek (by 1/30) all frames in a video 33.423 (seconds), Chromium 82 takes 267.686 (seconds), and never calculates duration to 33.423, rather 33.394 and never sets ended to true or fires ended event https://bugs.chromium.org/p/chromium/issues/detail?id=1065675#c6.

Using 4 cores at Chromium might decrease the time taken to get all images, yet still be slower than Firefox implementation. That would be a test would be interested in relevant to multithreading.

Is there a way to prototype the feature you are requesting and compare the different outputs with same input?

wytrych commented 4 years ago

The feature is already prototyped in Chrome. It's implemented here and now. To test you can try the following: See how many oscillators you can create before you get glitching. In Chrome you should then be able to create a second amount when you do that on a separate AudioContext.

guest271314 commented 4 years ago

Glitching can occur in different ways using AudioContext. Can you create a minimal, complete, verifiable example to reproduce and compare to Nightly?

guest271314 commented 4 years ago

At a legacy 32-bit machine

$ lscpu
Architecture:        i686
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1

Creating 53 oscillators consistently glitches periodically. It is not clear if gc() impacts output at all here

      const oscillators = [];
      (async _ => {
        // gc();
        const context = new AudioContext();
        await context.audioWorklet.addModule('lib/bypass-processor.js');
        const bypasser = new AudioWorkletNode(context, 'bypass-processor');
        for (let i = 0; i < 53; i++) {
          const osc = new OscillatorNode(context);
          osc.connect(bypasser).connect(context.destination);
          osc.start();
          oscillators.push(osc);
        }
        // gc();
        console.log(oscillators.length);
      })()
      /*
      .then(async _ => {
        // gc();
        const context = new AudioContext();
        await context.audioWorklet.addModule('lib/bypass-processor.js');
        const bypasser = new AudioWorkletNode(context, 'bypass-processor');
        for (let i = 0; i < 53; i++) {
          const osc = new OscillatorNode(context);
          osc.connect(bypasser).connect(context.destination);
          osc.start();
          oscillators.push(osc);
        }
        // gc();
        console.log(oscillators.length);
      });
      */

padenot commented 4 years ago

Virtual F2F:

This is desirable, but care needs to be taken to not have an implicit linearization point in the OS mixer, which defeats the purpose. Having multiple threads for audio that are non-blocking wrt the OS audio callback means that there is a need to buffer at least one render quantum somewhere, which adds latency, so this needs to be opt-in. This is essentially a latency vs. throughput trade off, implementation have a different view on the matter today (Firefox vs. Chrome for example), but both behaviour have their uses.
This was also talked about recently at https://lists.w3.org/Archives/Public/public-audio/2020AprJun/0014.html

wytrych commented 4 years ago

@padenot so what would be the next steps here to push this forward?

padenot commented 4 years ago

I guess research the question hinted at in https://github.com/WebAudio/web-audio-api-v2/issues/85#issuecomment-644801445, and then design an API for this.

padenot commented 4 years ago

TPAC 2020:

There are "real" pthreads available in WASM, but they don't have the correct priority for real-time audio work
The feeling in the "room" was that using multiple AudioContext for this was a hack and wasn't desirable
More use-cases and other evidence this is important is needed, as most real-time audio system use only a single thread for their processing, often for latency reasons

wytrych commented 4 years ago

Thanks for the update @padenot. I guess the biggest benefit for using multiple threads in this context is benefit of using more cores of a CPU in audio heavy applications. We have seen apps benefiting from using multiple threads - ie. getting more audio nodes playing back without glitching.

hoch commented 4 years ago

Actually WASM's pthread is a wrapper on top of Worker thread, so changing its thread priority won't be a viable option at the moment.

hoch commented 1 year ago

2023 TPAC Audio WG Discussion:

The WG believes that supporting multi-threaded rendering for the non-AudioWorklet case would make the project almost infeasible due to the significant complexity of the implementation.

Also similarly to https://github.com/WebAudio/web-audio-api/issues/2500#issuecomment-1719728519, using the Web Worker with configurable QoS would be more general and scalable solution for the worklet-powered use case. The WG will not be pursuing this capability as an active project.

WebAudio / web-audio-api

Multithreading #2409