Open hoch opened 5 years ago
In the teleconference today, we agreed that multi-input/output aggregation should be provided by OS. The group is in favor of the option 1 from Problem 1. The DJ app use case was brought up as a counterexample, but developers can use multiple outputs to separate audience outs and monitor outs.
For the configurability, the collective thought was to have some sorts of controls, but we have not agreed the degree or scope of it.
Let me also add that we discussed exposing the resampler (if needed) so that the developer can trade-off quality vs latency. There was no decision to do anything about this, but something that we might want to think about.
Also want to give the rationale for doing option 1 from Problem 1:
Please correct me if I got these things wrong.
There are several use cases in which an audio device client would be created with only a single input or output device, but not the other. For these use cases, having to pay for the additional latency of clock synchronization without reaping any benefits would be unfortunate.
Any sound-generating application that is sensitive to latency (like a game) will have this issue. These apps rarely need audio input, and if they do, they do not require clock synchronization. They would sooner create two separate ADC contexts, one for input and one for output, if doing so would bypass clock synchronization. Mixing input and generated audio would eventually be done via SharedArrayBuffers and Atomics instead, and only when needed (e.g. when the player enables voice chat in a multiplayer match).
Or perhaps what I'm describing is the 'raw mode' briefly mentioned in the code example? It not entirely clear to me what this feature does.
It sounds like you want to use two AudioDeviceClient
s, one for input, one for output.
One thing that is important and that is not being talked about here, is the fact that browser have to have another IPC boundary between the system audio input/output code and the "content" code, that runs scripts, etc., to be able to properly sandbox "content" code. This is in contrast to native programs that do the audio IO directly.
Aggregating input and output stream, re-clocking in the process that does the audio IO, and doing only a single IPC transaction to the content process is far superior than doing multiple context switches and buffering. Doing so allows using lower buffer sizes, not the opposite: more threads mean more real-time threads and more context switches, which increases scheduling hazard and scheduler pressure, and leads to needed bigger buffer size to have solid audio.
The high level nature of AudioContext
and MediaStream
s allows easily implementing this today: for example, round-trip latency in Firefox on OSX is limited by the the fact that the Web Audio API requires doing block processing with 128 frames buffers: we're currently sub-10ms round trip on OSX without special hardware, but the limit is arbitrary.
The one of key advantages of ADC is a single callback function for input and ouptut. This is possible by combining input and output streams and serving them to user. As shown in the example, user can specify two different IDs for input and output respectively.
It is common that two devices are physically separated. (i.e. different clocks, sample rate and threads) To serve these isolated streams, the system needs to re-clock/sample the audio data before sending them to a callback function. This is so-called "device aggregation" in ADC.
Problem 1. The scope of aggregation
For the option 2 (which is quite similar to MacOS's aggregate device), we can think of something like this:
Problem 2. The configurability of aggregation layer
The aggregation by the system will be involved with many parameters; resampling quality, options for reclocker, speed/quality trade off and etc. Should ADC expose these options at all? Or should we just say this is up to UA? Or should this be somewhere in the middle?
NOTE: @padenot mentioned in TPAC 2018 that FireFox uses this "re-clocking" mechanism to aggregate and align audio data from multiple devices.