Support extended PCM format handling, like foobar's decode postprocessor service

kode54 commented 8 years ago

This is a feature request and planning ground for supporting a feature similar to foobar2000.

A plugin API implements a central function that searches for plug-ins that handle this service, similar to decoder plug-ins. Only in this case, the interface is only meant to be passed decoded PCM, one block at a time, along with its format information. The API also specifies a floating point duration in seconds that indicates how much PCM data to pre-roll into the handler, which will also be discarded by the caller.

Currently, foobar2000 uses this interface from most internal and some external PCM decoding formats, such as WAV, AIFF, FLAC, WavPack, ALAC, and TAK. The decoder interface also includes a flag which may be passed in at decoder startup time, to indicate that the post processor service should not be used.

The existing post processor components are for HDCD and for DTS. Both are mutually exclusive to each other, through their design, since the API tells the plug-in implementing it whether a block of sample data has been modified already by another plug-in, and the plug-in returns a status code to indicate whether it modified the sample data. Presumably, the caller discards the block chain it passed in if it was unmodified.

The HDCD decoder scans up to 10 seconds of audio data for HDCD signatures, and if it finds two consecutive signatures, it turns itself on for the duration of the track. Otherwise, it disables itself until discarded. It also performs a sample rate and bit depth check on startup.

The DTS decoder also scans the data for two consecutive DTS packets, which it validates. If it finds two, it turns itself on permanently, preceding the packets with silence according to however much silence padding there was before the first DTS packet. If it fails to find enough data, it disables itself as well. It performs a rudimentary sample rate and bit depth check first as well.

This service would provide a mechanism for DTS CDs to be decoded from WAV files, or other losslessly compressed PCM formats that could contain DTS CDs. Or HDCDs. The DTS processor would be implemented by the same plug-in which handles the raw file formats, unless that happens to be the FFMPEG plug-in, since I think it would require different API usage to pass raw packets to its DTS decoder.

I may look into implementing this myself, but I am looking for ideas for the correct way to implement this as an API in DeaDBeeF's architecture.

Oleksiy-Yakovenko commented 8 years ago

It sounds a lot like DSP plugin API, but with a few important differences.

With the current architecture, adding this would require a new plugin type, I think.

Otherwise it would need extending the plugin API to add querying any plugin for "services".

So, we need to decide which route to go -- extending the API with a "query plugin service" feature; or adding a new plugin type.

kode54 commented 8 years ago

New plugin type would be the simplest, but would prevent bundling processors that could also be force instantiated as DSPs into the same plugin as the DSP.

But in actuality, I can only think of two types of converters where this "postprocessor" service is useful:

1) AC3/DTS bitstream decoder, for CDs in that format which have been ripped to FLAC or similar 2) HDCD decoding, also for FLAC rips.

Both of those are best served by an automatic parser which may optionally be turned off if the user wishes to convert the original data unmangled to another lossless format, or turned off if one of them somehow triggers a false positive, which shouldn't happen at all if the services are designed properly.

AC3/DTS will be designed properly to check for 3 or more consecutive packets of compressed data, with a uniform amount of padding, assumedly signaled by the compressed format.

HDCD will be designed to verify a continuous use of signaling bits, and uniform configuration between both stereo channels, to prevent the known cases of some false positives occurring only in one speaker due to improperly mastered content, or perhaps intentionally mastered that way to trip up non-spec decoders. (Spec calls for sustained status codes in both channels to match.)

So yeah, not much use for a DSP, except to test it with decoders which do not have the processor service implemented into their chain.

Now, should this need to be implemented by the decoder? The way foobar2000 does this, it has a special decoder template class that wraps the post processor onto the decoder at the SDK level, inside any given component. (decoder_postprocessed) Said template also accepts a newer flag to the decode_initialize function which instructs the decoder to bypass post-processing, since the converter and other services may request unprocessed audio data.

Or it may be implemented in the core itself, as a wrapper to the decoder. But then the core still needs to know which formats should be processed at all. In foobar2000, only lossless PCM formats are wrapped in the decoder_postprocessed template, but it's up to the developer of third party components to perform this wrapping.

bp0 commented 8 years ago

I've implemented an HDCD decoder as a DSP plugin at deadbeef-hdcd. It works well enough, but it requires the plugin to be first in the chain so that the HDCD packets do not get destroyed by any other processing, and it requires converting from 32-bit float, to signed 16-bit, and depending on how it was converted to 32-bit float to begin with, may not always work exactly the same. (1.0 = 0x8000 or 0x7fff, or perhaps 1.0 = 0x7fff, but -1.0 = 0x8000).

The foobar HDCD component (by kode54) attempts to detect HDCD in advance and enable or disable itself. This is because HDCD decoding applies -6dB to the audio to make room for peak extend. It can be quite a noticeable drop if HDCD is disabled and then is detected. This is why it needs to pre-roll audio into the component, so that the presence of HDCD is known at the first sample. I've gotten around this by the simple way of just always applying the -6dB (it is actually a shift right 1) even when HDCD is not present. This is a working method, (and apparently the method of the PM Model Two), but maybe not ideal.

Adding a parameter to the DSP plugin to specify some seconds of pre-roll (after a seek, lets say) that will be discarded seems like a good idea in general, for example an echo or reverb plugin, and could be used by the HDCD decoder so that the state is known at the first played sample.

Also: I noticed a recent commit to deadbeef c49af61f505586966cae0901740a6243504f4da3 where ReplayGain is applied in the decoder plugin. If this was implemented for FLAC or WavPack, HDCD decoding via the DSP plugin would probably not work anymore. There is also the problem of most ReplayGain scanners not knowing about HDCD, so it would be nice to let the HDCD DSP plugin disable ReplayGain, because the values won't be right anyway, and will likely make the output very quiet.

Oleksiy-Yakovenko commented 8 years ago

@bp0 it's very important to apply replaygain/preamp before the clipping occurs in the lossy input plugins.

However, a mechanism will be added to get the raw float32 signal from them, as this is needed for e.g. the replaygain scanner, converter, etc.

Lossless formats, like FLAC and WavPack, don't need this, and will remain as is.

(I'm not sure what about wavpack lossy, but that still wouldn't affect your use case)

kode54 commented 8 years ago

Wavpack lossy still outputs integer PCM, it only throws away some precision in the least significant bit(s) depending on what compresses better. Any difference would only be audible as noise.

DeaDBeeF-Player / deadbeef

Support extended PCM format handling, like foobar's decode postprocessor service #1585