PortAudio / portaudio

PortAudio is a cross-platform, open-source C language library for real-time audio input and output.
Other
1.43k stars 296 forks source link

make audio data format conversion policy crystal clear #825

Open RossBencina opened 1 year ago

RossBencina commented 1 year ago

The task is to specify and document exactly when clients can expect PortAudio to convert between different audio sample data formats.

The question here is when should format conversion occur now how it should be performed.

Background

Since before the version 19 API, PortAudio has provided facilities to automatically convert sample data to/from an appropriate format that is supported by the native audio API. For example, if the user supplies 32-bit integers, and the native device only accepts 16-bit integers, PortAudio performs an appropriate conversion. The principle here is that PortAudio clients can always pass data in any of the PortAudio formats to/from PortAudio.

Format conversions entail scaling and/or bit shifting, and may incorporate dither and/or clipping. The PortAudio API provides options for switching on/off dithering and clipping when performing these conversions. The choice of scaling and dithering algorithms are discussed in other tickets.

Floating-point to integer conversions have special status because most (all?) audio hardware uses (linear) integer sample formats. As discussed elsewhere, PortAudio chose the float-integer scaling factors to ensure that an amplitude +/- 1.0 sine is not clipped under float -> int conversion. This is (arguably) an important API contract. Historically, native audio APIs accepted only integer formats and passed them through to the driver/hardware, consequently PortAudio always took responsibility for float-integer conversions, and clients could rely on PortAudio providing specific scaling behavior. More recently, some (not all) native APIs accept floats, and use floats internally for dsp and mixing operations. In such cases it may be desirable to pass-through floats unaltered rather than converting to an integer format.

Maintaining integrity of audio data is a very important to any audio API. The question in 2023 is to what degree PortAudio should attempt to impose its own conversions (i.e. to provide predictable, consistent conversion behavior across all-platforms) and to what degree it should get out of the way and let the OS do the work (i.e. to provide native behavior, which for example might result in different float-integer conversions when using different host APIs, but on the other hand, might reduce the number of float-integer conversions). This is particularly important with float-integer conversions because the previously advertised policy of deterministic float-integer conversions on all platforms no longer looks like the correct choice for some common use-cases where the OS is likely to convert integers back to floats again.

related:

RossBencina commented 1 year ago

Some useful definitions:

Observation: In some sense, float-integer-float conversions are always hardware-required conversions. However some native APIs (e.g. CoreAudio, Android) use floats as the default, internal, or only data representation. Should PortAudio perform the conversion to provide guaranteed cross-platform well-specified conversion?

dechamps commented 1 year ago

Thanks for the write-up. Here are my thoughts. (Note that to make this less verbose I am going to assume the playback direction, i.e. output to a DAC - obviously, everything below applies to recording from an ADC as well.)

The question in 2023 is to what degree PortAudio should attempt to impose its own conversions (i.e. to provide predictable, consistent conversion behavior across all-platforms) and to what degree it should get out of the way and let the OS do the work (i.e. to provide native behavior, which for example might result in different float-integer conversions when using different host APIs, but on the other hand, might reduce the number of float-integer conversions)

I've said this before, and I'll say it again: PortAudio should not surprise its users.

When PortAudio has the option of taking the audio data from the app and handing it off as-is to the OS, without any conversion, then that's what it should do, because that's what any reasonable person would expect to happen. It's simple, efficient, and it works. No reasonable person would expect PortAudio to just throw in extra unnecessary conversions out of the blue - that's just bizarre.

The only reason why we seem to be arguing about this is because you seem obsessed with some very niche use case along the lines of "clients could rely on PortAudio providing specific scaling behavior". I'd argue that very few users care about this. If I wanted to have this kind of complete control over sample bit patterns, then I would implement the conversion myself in my app - I would not use an audio I/O library such as PortAudio to do it, because that kind of extremely narrow use case doesn't fit the scope of a generalist audio I/O library.

Native APIs range from full well-specified (e.g. ASIO, ALSA HW) through to no-guarantees (e.g. Android, where the HAL can do whatever audio manipulations it likes).

For Windows you can add "WASAPI Exclusive, WDM-KS" to your "well-specified" list, and "MME, DS, WASAPI Shared" to your "no-guarantees" list.

More specifically, on Windows Vista+, when using the shared audio pipeline (i.e. WASAPI Shared, which MME and DS redirect to internally), the Windows audio engine will automatically do sample rate conversion, sample type conversion, downmixing/upmixing, mixing, software volume control (if required), audio limiting (CAudioLimiter), and then there are APOs where audio device manufacturers (and, to a lesser extent, third parties) can decide to do literally whatever they want to the audio signal as it passes through the pipeline.

Given the above, the whole idea of trying to guarantee "well-specified conversion" when using WASAPI Shared, MME or DS is absurd on its face. In the vast majority of cases, downstream processing in the Windows audio engine will destroy your "well-specified conversion" many times over before it reaches the DAC. The battle is over before it started. It doesn't make sense to try to introduce concepts such as "delegated conversion" in this setup - we're way past that already.

And, to be clear, this is perfectly fine. Reasonable users will not expect perfectly accurate operation when using the shared Windows audio pipeline, because they know that it's not designed for it. The Windows audio engine is designed for convenience (automatic conversions, system-wide effects, etc.), not accuracy. If a user is chasing perfect accuracy (i.e. bit-perfectness and the like), and they configure PortAudio to use WASAPI Shared, MME or DS, then they're doing it wrong, and the PA docs should reflect as such. Consistent with that philosophy, it makes sense for PortAudio to do the simple, efficient, obvious thing and just pass the application's audio data through as-is to the OS. There is no point in doing anything else.

Users who are after accuracy should use OS audio facilities that are designed for that use case. For Windows, that means WASAPI Exclusive, WDM-KS, or ASIO. These 3 APIs come with a reasonable expectation that the client will be given exclusive, direct access to the DAC's audio buffers with no automatic conversions.

Currently, when using one of the aforementioned "bit-perfect" host APIs in PortAudio, PortAudio will determine which formats are supported by the hardware, and automatically convert as necessary. I'd argue this is doing the user a disservice, because if they are explicitly configuring PortAudio to use one of these specialist host APIs, then chances are they care deeply about their samples making it to the DAC untouched, and they do not want PortAudio to mess with them in any way, even if the conversion is lossless - they'd prefer PortAudio to return an error instead. Thankfully the WASAPI Host API provides a flag to that effect, but it really feels like that should just be a general PA frontend option, not an Host API specific one.

So, in conclusion, here's how I would like PortAudio to behave in an ideal world:

This would then map to the following use cases: