Closed 9001 closed 6 months ago
we probably want to use astats
instead of volumedetect
to calculate what gain to apply (more accurate values and works better with float input) so -af astats=measure_perchannel=none:measure_overall=none+Peak_level+RMS_level
value mapping:
volumedetect | astats |
---|---|
max_volume | Peak level dB |
mean_volume | RMS level dB |
make sure to include the astats
arguments, otherwise it will print those measurements for each channel (Channel: 1
, Channel: 2
, ...) in addition to the ones we actually want (Overall
)
also make sure that astats
is the first filter in the chain when analyzing, and likewise that volume
is the first filter when applying the gain, because newer FFmpeg versions will try to repair clipping in the input by interpolating samples it thinks got btfo and many other filters will cast to int16 and discard that info
store the measurement output in the db as-is, however when calculating the gain to apply we probably want to min(0, Peak_level_dB)
since analysis output above zero is the interpolated values and mostly dontcares
full analysis example with output:
ffmpeg -hide_banner -nostdin -i 'Angela - Shangri-La.mp3' -af astats=measure_perchannel=none:measure_overall=none+Peak_level+RMS_level -c:a pcm_s16le -f null - 2>&1 | grep -E '^\[Parsed_astats_0 @ .* dB:'
[Parsed_astats_0 @ 0x5636d42b1cc0] Peak level dB: 2.497428
[Parsed_astats_0 @ 0x5636d42b1cc0] RMS level dB: -10.613413
volumedetect
on the same track, which shows the original values without the clip repair:
[Parsed_volumedetect_0 @ 0x557c4398fc00] mean_volume: -10.6 dB
[Parsed_volumedetect_0 @ 0x557c4398fc00] max_volume: 0.0 dB
so with a target amplitude of -11dB
(too loud but just for this example) it would be safe to set -af volume=-0.387dB
since the above-zero peak is from reconstructed samples, making the new "real" peak -0.387dB
but the actual output will peak at 0dB
(since it'll keep the reconstructed samples and then reclip w)
the ebur128 variant has been running for a while now and I haven't noticed any particular artifacts while using it, dunno if @icxes has noticed anything while listening to his test setup.
But I think it would be fine to keep enabled otherwise and we can see if it causes issue once we go live
should consider adding a gain while transcoding so all songs hit the same perceived volume
the analysis step would run once for each song, storing the measurements in the db
need to choose between two normalization approaches:
rms normalization hits a given mean volume if the max peak allows for it; some songs will come out way lower due to dynamics (or just a single loud sample) but the dynamic range is fully preserved and it's fast
ebur128 normalization (broadcasting standard) does dynamic scaling of the volume across the song if necessary, trying harder to hit the target volume, modifying the dynamics, worst case causing some intense ducks after a short attack (which sounds hella bad ngl) also crazy slides like https://i.fiery.me/EprKj.png
rms
rms normalization requires knowing the max and mean volume beforehand, obtainable with the following FFmpeg command (outputs
max_volume
andmean_volume
):assuming our target mean_volume is -14 LUFS and a given song produces max=-3 and mean=-16, gain would be 2dB so that mean=-14 and max stays below zero (clipping otherwise)
that's it for rms normalization (which is probably what we want) but including ebur128 too just in case
ebur128
ebur128 normalization requires knowing
I
,TP
,LRA
,thresh
,offset
obtained like this:then append the measured values to
$cfg
and normalize:-ar 44100
below is very neededi think
I
= perceived volume,TP
= max permitted peak,LRA
= dynamic range (max-min
across output), not sure aboutthresh
andoffset
looks like
I=-22
andTP=-3
is common in radio studios,LRA=18
or anywhere closer to 0