Describe the algorithm that DynamicsCompressorNode should use

olivierthereaux commented 11 years ago

Originally reported on W3C Bugzilla ISSUE-19885 Wed, 07 Nov 2012 00:51:17 GMT Reported by Ehsan Akhgari [:ehsan] Assigned to

Currently the spec doesn't provide much information on what the algorithm behind DynamicsCompressorNode should look like, which is not very helpful for implementers.

russellmcc commented 10 years ago

This just came up on the mailing list because at least some implementations use Automatic Make-up Gain, which is not a ubiquitous feature on real, professional compressors. After looking in the spec, it's not even specified whether the DynamicsCompressorNode should use auto make-up or not. This (make-up gain) should probably be a user option.

cwilso commented 9 years ago

@padenot weren't you working on this?

padenot commented 9 years ago

I've started to gather some notes, yes.

chrislo commented 9 years ago

Need some help? I can take a look. On 30 Oct 2014 22:08, "Paul ADENOT" notifications@github.com wrote:

I've started to gather some notes, yes.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/10#issuecomment-61178320 .

cwilso commented 8 years ago

Once this is done, please ping me on #13 so I can incorporate.

joeberkovitz commented 8 years ago

The decision is to reverse engineer the code and document the current algorithm so that we're not wedged on further decision making, and can do a more disciplined compression/expansion algorithm in v.next

joeberkovitz commented 8 years ago

Noted: we're going to attempt to progressively describe the algorithm starting at a general level of detail of what components, states and procedures constitute the compression behavior. This in itself will be a big step over the current spec. Informative sections supplying suggested behavior can then be written, lifted from the current implementation, but we're not going to prescribe that implementations exactly mimic the current one in every last detail.

svgeesus commented 8 years ago

This paper seems approachable, clear and useful: Digital Dynamic Range Compressor Design— A Tutorial and Analysis https://www.eecs.qmul.ac.uk/~josh/documents/GiannoulisMassbergReiss-dynamicrangecompression-JAES2012.pdf It seems clear from that analysis that our compressor is a feedforward type, with an RMS detector, and that both attack and release are exponential. So we could start by describing those blocks (and a diagram of how they fit together).

svgeesus commented 8 years ago

This is also helpful, but more mathematical and aimed at analog circuitry rather than digital implementation Attack and Release Time Constants in RMS - Based Compressors and Limiters http://www.thatcorp.com/datashts/AES4054_Attack_and_Release_Time_Constants_II.pdf

padenot commented 8 years ago

Thanks, I'll be reading those for sure. I plan to start working on this after I'm finished with the AudioBufferSourceNode rendering spec bit (#95).

joeberkovitz commented 7 years ago

Adding @rtoy as discussed on yesterday's call - @padenot we need notes from you on how to proceed.

padenot commented 7 years ago

The current DSP chain of the compressor is the following:

The audio signal is the audio to be processed. The control signal is the audio that will determine how much compression should be applied to the audio signal. Those two signals can be the same, or different, when side-chaining.

Pre-emphasis filter: four cascaded high-pass zero pole filters at 15k, 7.5k, 3.75k, 1.875k, with a constant gain of 4.4dB, applied on the audio signal
Pre-delay: a simple delay line of 0.006 seconds (fixed look-ahead) applied on the audio signal
Target attenuation: based on the current control signal, process the signal with a compression curve. The compression curve's shape is defined by the parameters set on the compressor, but has a shape that looks like this: This is where most of the magic happens in the current implementations, this is full of magic numbers that produce a smooth curve. I don't think the curve itself should be specced, but it's reasonable to more-or-less shape the characteristics of the curve.
Power average: this is the target attenuation, smoothed over multiple audio frames. If the attenuation is more than the current average, we're releasing, otherwise, we're in the attack portion. This is done so that we can have an adaptive release: the harder the compressor compresses, the faster the release.
Envelope: from the power estimation, and depending on whether we're in the attack portion or the release portion (we're in the attack portion if the current reduction is smaller than the target reduction, and vice-versa), this is the speed at which the current compression level will move to the desired compression level computed before. This depends on the attack and release parameters of the compressor.
- Gain computation: this is 1.0 if the compressor is inactive, and less than 1.0 if some compression is to be applied. Depending on the envelope, the current gain approaches exponentially the target gain.
- Gain stage: simply apply the gain to the (delayed) audio signal
- Post de-emphasis filter: four cascaded low-pass zero pole filters at 15k, 7.5k, 3.75k, 1.875k, with a constant gain of 4.4dB, applied to the audio signal

There is also a thing about pre-warping the power average that I don't understand yet.

Based on this, we can easily standardize the following:

Pre emphasis and post-de emphasis filter (if needed).
Enveloppe (the speed at which the gain evolves, this is currently different for attack and release)
Gain computation (the current gain value approaches exponentially to the target, at the rate computed in the Enveloppe section)
Gain stage (this is just a gain node)
Pre-delay (this is just a delay node)

The problematic part is the computation of the target attenuation. We could decide that implementations should have a curve that follows a hard-knee compression curve (that is easily speccable), but that having a soft knee is allowed, without too much restriction. We can say that the knee MUST be a monotonously growing function that goes from the knee threshold (i.e. the minimum value that will trigger some compression) to the knee end (where the compression is linear again), without discontinuities with the regular compression curve. In other word, some function that joins the first and second part of the compression curve, without discontinuities.

Compressors are implemented with a variety of techniques (i've read a number of open-source implementations to get a sense of the topic), and specifying a compressor based on the algorithm seems quite challenging (but doable), as well as quite limiting. I'm wondering that maybe it would be clearer to describe a compressor with less features (for example, having a hard knee, no emphasis, no adaptive release, no pre and post filtering), and stating that vendors are allowed to implement something a little bit different (a bit like in SVG where the blur is specced as a Gaussian blur, but where it's stated that a triple pass box blur is acceptable).

Thoughts?

hoch commented 7 years ago

FWIW, Chrome removed the pre-emphasis filter a while ago. We don't want any redundant coloration. The basic building block must be transparent and fast. The compressor node violates this rule in many aspects, but the pre-emphasis was the easiest one to remove.

There is also a thing about pre-warping the power average that I don't understand yet.

Yeap, that's where I stopped. :)

Few thoughts:

The static numbers in the code is quite arbitrary and I am not sure how we can make decisions on them. Are these numbers to be specified?
The current implementation has the adaptive release (program-based release) that requires a large look-ahead. This is not ideal for the individual track compression - I was told our compressor is sort of designed for the 'master compression' and this is not specced anywhere as well.

joeberkovitz commented 7 years ago

As per call today:

removing pre/post emphasis filters as these can be done outside the node
no other objections to approach.
@padenot will take this spec forward from here

joeberkovitz commented 7 years ago

Also we need to add expander functionality so that we can at least approach a noise-gate-like feature.

joeberkovitz commented 7 years ago

Asking @rtoy to take the lead on this at this point. Input and assistance from @padenot would be appreciated of course!

joeberkovitz commented 7 years ago

@padenot is picking this up again, to have a PR ready prior to the F2F by his request.

joeberkovitz commented 7 years ago

Here are some notes that I've gleaned based on reading the code, as an independent source of material for the forthcoming writeup. Hopefully it's useful to have another pair of eyes on this. Like Paul I'm trying to characterize what needs to be speced, not the exact details of what the code does.

Signal processing path:

For a stereo node, a mono input is doubled up to be both L and R inputs.
All output from the compressor is delayed by a nonconfigurable amount (either 256 samples or .006s, not sure which). The live signal is used to compute the gain applied to the delayed signal.
A wet/dry mix is supported, but not documented. Since the default is 100% wet, we could just ignore this and not spec this feature.
The release rate is adjusted to be roughly equal to release parameter when the compressor is more fully adjusted to the current desired gain, but adaptively decreased (i.e. made faster) when the compressor is farther from the desired gain. Presumably this is so transient spikes don't cause a long, audible release -- the release parameter is supposed to apply to the compressor after it's stablized, not to what it does after a transient. This seems like a nice thing that impls can do but that will be very difficult to spec (or to reverse engineer) and there is no developer control over it. Maybe we can just say that the value of the release parameter is an asymptotic ceiling, and the impl can reduce it adaptively.
The attack rate is designed to work off the largest "compression demand" seen during the current attack phase and to slew the gain towards this desired gain within the attack timeframe. I don't think this needs specing but a bit unsure.

I don't hve anything to add to @padenot's suggestion the curve, except to clarify that there are three distinct regions of the curve: linear, knee, and ratio-driven. I think we could just say that the impl computes the knee to ensure a smooth transition up to the first derivative.

Finally, some comments on the existing spec language:

The latency introduced by the node should be highlighted in its interface description as for other nodes
The knee parameter is described as A decibel value representing the range above the threshold where the curve smoothly transitions to the "ratio" portion. Its default value is 30. This is a bit confusing; this parameter is actually the extent of a decibel range that starts at threshold and ends at threshold+knee, over which the knee portion of the curve cuts in.
The meaning of reduction should be clarified. Currently the language suggests it might be some kind of relative gain but from the code it looks like the value here is actually the actual gain in dB being applied to the input signal (sometimes with smoothing).

joeberkovitz commented 7 years ago

From F2F: We reviewed @padenot's draft changes for this description. To complete PR by July 6.

padenot commented 7 years ago

This is up for review in #1278.

joeberkovitz commented 7 years ago

Closed via #1278

WebAudio / web-audio-api

Describe the algorithm that DynamicsCompressorNode should use #10