Closed olivierthereaux closed 7 years ago
This just came up on the mailing list because at least some implementations use Automatic Make-up Gain, which is not a ubiquitous feature on real, professional compressors. After looking in the spec, it's not even specified whether the DynamicsCompressorNode should use auto make-up or not. This (make-up gain) should probably be a user option.
@padenot weren't you working on this?
I've started to gather some notes, yes.
Need some help? I can take a look. On 30 Oct 2014 22:08, "Paul ADENOT" notifications@github.com wrote:
I've started to gather some notes, yes.
— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/10#issuecomment-61178320 .
Once this is done, please ping me on #13 so I can incorporate.
The decision is to reverse engineer the code and document the current algorithm so that we're not wedged on further decision making, and can do a more disciplined compression/expansion algorithm in v.next
Noted: we're going to attempt to progressively describe the algorithm starting at a general level of detail of what components, states and procedures constitute the compression behavior. This in itself will be a big step over the current spec. Informative sections supplying suggested behavior can then be written, lifted from the current implementation, but we're not going to prescribe that implementations exactly mimic the current one in every last detail.
This paper seems approachable, clear and useful: Digital Dynamic Range Compressor Design— A Tutorial and Analysis https://www.eecs.qmul.ac.uk/~josh/documents/GiannoulisMassbergReiss-dynamicrangecompression-JAES2012.pdf It seems clear from that analysis that our compressor is a feedforward type, with an RMS detector, and that both attack and release are exponential. So we could start by describing those blocks (and a diagram of how they fit together).
This is also helpful, but more mathematical and aimed at analog circuitry rather than digital implementation Attack and Release Time Constants in RMS - Based Compressors and Limiters http://www.thatcorp.com/datashts/AES4054_Attack_and_Release_Time_Constants_II.pdf
Thanks, I'll be reading those for sure. I plan to start working on this after I'm finished with the AudioBufferSourceNode
rendering spec bit (#95).
Adding @rtoy as discussed on yesterday's call - @padenot we need notes from you on how to proceed.
The current DSP chain of the compressor is the following:
The audio signal is the audio to be processed. The control signal is the audio that will determine how much compression should be applied to the audio signal. Those two signals can be the same, or different, when side-chaining.
There is also a thing about pre-warping the power average that I don't understand yet.
Based on this, we can easily standardize the following:
The problematic part is the computation of the target attenuation. We could decide that implementations should have a curve that follows a hard-knee compression curve (that is easily speccable), but that having a soft knee is allowed, without too much restriction. We can say that the knee MUST be a monotonously growing function that goes from the knee threshold (i.e. the minimum value that will trigger some compression) to the knee end (where the compression is linear again), without discontinuities with the regular compression curve. In other word, some function that joins the first and second part of the compression curve, without discontinuities.
Compressors are implemented with a variety of techniques (i've read a number of open-source implementations to get a sense of the topic), and specifying a compressor based on the algorithm seems quite challenging (but doable), as well as quite limiting. I'm wondering that maybe it would be clearer to describe a compressor with less features (for example, having a hard knee, no emphasis, no adaptive release, no pre and post filtering), and stating that vendors are allowed to implement something a little bit different (a bit like in SVG where the blur is specced as a Gaussian blur, but where it's stated that a triple pass box blur is acceptable).
Thoughts?
FWIW, Chrome removed the pre-emphasis filter a while ago. We don't want any redundant coloration. The basic building block must be transparent and fast. The compressor node violates this rule in many aspects, but the pre-emphasis was the easiest one to remove.
There is also a thing about pre-warping the power average that I don't understand yet.
Yeap, that's where I stopped. :)
Few thoughts:
As per call today:
Also we need to add expander functionality so that we can at least approach a noise-gate-like feature.
Asking @rtoy to take the lead on this at this point. Input and assistance from @padenot would be appreciated of course!
@padenot is picking this up again, to have a PR ready prior to the F2F by his request.
Here are some notes that I've gleaned based on reading the code, as an independent source of material for the forthcoming writeup. Hopefully it's useful to have another pair of eyes on this. Like Paul I'm trying to characterize what needs to be speced, not the exact details of what the code does.
Signal processing path:
release
parameter when the compressor is more fully adjusted to the current desired gain, but adaptively decreased (i.e. made faster) when the compressor is farther from the desired gain. Presumably this is so transient spikes don't cause a long, audible release -- the release parameter is supposed to apply to the compressor after it's stablized, not to what it does after a transient. This seems like a nice thing that impls can do but that will be very difficult to spec (or to reverse engineer) and there is no developer control over it. Maybe we can just say that the value of the release
parameter is an asymptotic ceiling, and the impl can reduce it adaptively.attack
timeframe. I don't think this needs specing but a bit unsure.I don't hve anything to add to @padenot's suggestion the curve, except to clarify that there are three distinct regions of the curve: linear, knee, and ratio-driven. I think we could just say that the impl computes the knee to ensure a smooth transition up to the first derivative.
Finally, some comments on the existing spec language:
The latency introduced by the node should be highlighted in its interface description as for other nodes
The knee
parameter is described as A decibel value representing the range above the threshold where the curve smoothly transitions to the "ratio" portion. Its default value is 30. This is a bit confusing; this parameter is actually the extent of a decibel range that starts at threshold
and ends at threshold+knee
, over which the knee portion of the curve cuts in.
The meaning of reduction
should be clarified. Currently the language suggests it might be some kind of relative gain but from the code it looks like the value here is actually the actual gain in dB being applied to the input signal (sometimes with smoothing).
From F2F: We reviewed @padenot's draft changes for this description. To complete PR by July 6.
This is up for review in #1278.
Closed via #1278
Currently the spec doesn't provide much information on what the algorithm behind DynamicsCompressorNode should look like, which is not very helpful for implementers.