streamer: replay gain - Githubissues

9001 commented 4 years ago

should consider adding a gain while transcoding so all songs hit the same perceived volume

the analysis step would run once for each song, storing the measurements in the db

need to choose between two normalization approaches:

rms normalization hits a given mean volume if the max peak allows for it; some songs will come out way lower due to dynamics (or just a single loud sample) but the dynamic range is fully preserved and it's fast
ebur128 normalization (broadcasting standard) does dynamic scaling of the volume across the song if necessary, trying harder to hit the target volume, modifying the dynamics, worst case causing some intense ducks after a short attack (which sounds hella bad ngl) also crazy slides like https://i.fiery.me/EprKj.png

rms

rms normalization requires knowing the max and mean volume beforehand, obtainable with the following FFmpeg command (outputs max_volume and mean_volume):

ffmpeg -hide_banner -nostdin -i some.mp3 -af volumedetect -c:a pcm_s16le -f null - 2>&1 | grep -E '^\[Parsed_volumedetect_0 @ '

assuming our target mean_volume is -14 LUFS and a given song produces max=-3 and mean=-16, gain would be 2dB so that mean=-14 and max stays below zero (clipping otherwise)

ffmpeg -hide_banner -nostdin -v warning -i some.mp3 -map 0:a:0 -af volume=2dB -c:a libmp3lame -b:a 192k -compression_level 0 -ar 44100 rms.mp3

that's it for rms normalization (which is probably what we want) but including ebur128 too just in case

ebur128

ebur128 normalization requires knowing I, TP, LRA, thresh, offset obtained like this:

cfg="I=-14:TP=0:LRA=11"

ffmpeg -hide_banner -nostdin -i some.mp3 -map 0:a:0 -af loudnorm=print_format=json:$cfg -c:a pcm_s16le -f null - 2>&1

# "input_i" : "-10.87",
# "input_tp" : "0.17",
# "input_lra" : "7.80",
# "input_thresh" : "-21.02",
# "target_offset" : "-0.83"

then append the measured values to $cfg and normalize:

cfg="$cfg:measured_i=-10.87:measured_tp=0.17:measured_lra=7.80:measured_thresh=-21.02:offset=-0.83"

ffmpeg -hide_banner -nostdin -i some.mp3 -map 0:a:0 -af loudnorm=print_format=summary:linear=true:$cfg -c:a libmp3lame -b:a 192k -compression_level 0 -ar 44100 ebur128.mp3

analysis runs at ~20x realtime on a macbook air
normalize is usually faster unless it has to scale the volume over time to hit the target
output is upsampled to 192khz so the -ar 44100 below is very needed
loudness scale is different from RMS, so I=-14 is usually far lower than the RMS -14
set linear=true to avoid adjusting the volume over time, but this only takes effect if the track's dynamic range permits reaching the target like that

i think I = perceived volume, TP = max permitted peak, LRA = dynamic range (max-min across output), not sure about thresh and offset

looks like I=-22 and TP=-3 is common in radio studios, LRA=18 or anywhere closer to 0

9001 commented 4 years ago

we probably want to use astats instead of volumedetect to calculate what gain to apply (more accurate values and works better with float input) so -af astats=measure_perchannel=none:measure_overall=none+Peak_level+RMS_level

value mapping:

volumedetect	astats
max_volume	Peak level dB
mean_volume	RMS level dB

make sure to include the astats arguments, otherwise it will print those measurements for each channel (Channel: 1 , Channel: 2 , ...) in addition to the ones we actually want (Overall)

also make sure that astats is the first filter in the chain when analyzing, and likewise that volume is the first filter when applying the gain, because newer FFmpeg versions will try to repair clipping in the input by interpolating samples it thinks got btfo and many other filters will cast to int16 and discard that info

store the measurement output in the db as-is, however when calculating the gain to apply we probably want to min(0, Peak_level_dB) since analysis output above zero is the interpolated values and mostly dontcares

full analysis example with output:

ffmpeg -hide_banner -nostdin -i 'Angela - Shangri-La.mp3' -af astats=measure_perchannel=none:measure_overall=none+Peak_level+RMS_level -c:a pcm_s16le -f null - 2>&1 | grep -E '^\[Parsed_astats_0 @ .* dB:'
[Parsed_astats_0 @ 0x5636d42b1cc0] Peak level dB: 2.497428
[Parsed_astats_0 @ 0x5636d42b1cc0] RMS level dB: -10.613413

volumedetect on the same track, which shows the original values without the clip repair:

[Parsed_volumedetect_0 @ 0x557c4398fc00] mean_volume: -10.6 dB
[Parsed_volumedetect_0 @ 0x557c4398fc00] max_volume: 0.0 dB

so with a target amplitude of -11dB (too loud but just for this example) it would be safe to set -af volume=-0.387dB since the above-zero peak is from reconstructed samples, making the new "real" peak -0.387dB but the actual output will peak at 0dB (since it'll keep the reconstructed samples and then reclip w)

Wessie commented 6 months ago

the ebur128 variant has been running for a while now and I haven't noticed any particular artifacts while using it, dunno if @icxes has noticed anything while listening to his test setup.

But I think it would be fine to keep enabled otherwise and we can see if it causes issue once we go live

R-a-dio / valkyrie

streamer: replay gain #65

rms

ebur128