Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.99k stars 1.49k forks source link

Loudness Normalisation #3651

Open unfa opened 3 years ago

unfa commented 3 years ago

Loudness Normalisation is a feature that's currently employed pretty much on all major streaming platforms (music- or video-centric alike).

If you have no idea what it is:

It's a way to accurately measure perceived loudness of an audio work, and to make sure all played back material is comparable in levels, so that users don't have to constantly play with the volume slider. It's like ReplayGain but for video streaming, and also better.

How YouTube implements this:

YouTube's reference level is close to -14 LUFS. Anything that measures more than -14 LUFS, gets turned down during playback to not cross the effective -14 LUFS Integrated.

This is great feature for viewers, as they will not be started by a suddenly loud video. but it also pushes creators to strive for better quality (more dynamic) audio mastering (especially for music) because being louder is no longer a way to stand out (it never was, but the music industry went insane over this).

Some platforms (and YouTube is one of them) do not turn up quieter videos, only turn down louder ones. I think this is good, as it's easier to implement, and doesn't require processing the input material, as well as it preserves integrity of uploaded material.

Some music streaming platforms use a limiter to rise levels of tracks that fall below their reference level - we could do this as well. If PeerTube analyzes the video and finds out it's reference level is - say -23 LUFS Integrated (that's European TV broadcast reference level BTW) it could offer the user an option to boost the audio levels using a limiter.

I do not think PeerTube should apply any dynamic range compression or limiting without user's explicit desire (an audio auto-correction could be implemented, but it's outside the scope of this proposal).

I think turning stuff up is complicated and is really low priority - YouTube doesn't do that at all for example.

Turning super-loud videos down however is a valuable feature, and an expected one nowadays.

There's already open-source tools available that could be used for this.

ebur128 is a commandline program that measures audio files against the EBU R128 specification, defining how LU and LUFS units work and how the measurements need to be done.

What do you think?

rigelk commented 3 years ago

ebur128 is a commandline program that measures audio files against the EBU R128 specification, defining how LU and LUFS units work and how the measurements need to be done.

https://www.npmjs.com/package/ffmpeg-normalize sounds like an easier candidate for us to integrate, with no extra installation step for instance administrators, implementing your algorithm. Could you help us choose its parameters, and provide example fixtures on which to reproduce the error/fix?

unfa commented 3 years ago

ebur128 is a commandline program that measures audio files against the EBU R128 specification, defining how LU and LUFS units work and how the measurements need to be done.

https://www.npmjs.com/package/ffmpeg-normalize sounds like an easier candidate for us to integrate, with no extra installation step for instance administrators, implementing your algorithm. Could you help us choose its parameters, and provide example fixtures on which to reproduce the error/fix?

I am not sure if ffmpeg-normalize will be suitable - from what I know it's used to alter an audio stream during conversion, isn't it? I think it'd be best to implement this like YouTube did - only analyzing the audio levels and applying the normalisation attenuation (if needed) during playback, by "offsetting" the volume slider. Applying it by re-encoding the audio stream is probably easier to implement, but it's destructive.

I can provide some example files that can be fed into this and checked against other ones to see if the normalization was successful.

For starters I think these three parameters are what we should look at the parameters that the normalizer needs :

            input_i: -23,
            input_lra: 7.0,
            input_tp: -2.0

So values I'd propose are:

            input_i: -14,
            input_lra: 7.0,
            input_tp: -1.0

I don't know if ffmpeg-normalize will boost levels of input, but I think it will - again, implementing this as an extra step during transcoding is not optimal, but could be a the proof of concept before implementing it non-destructively.

rigelk commented 3 years ago

ebur128 is a commandline program that measures audio files against the EBU R128 specification, defining how LU and LUFS units work and how the measurements need to be done.

https://www.npmjs.com/package/ffmpeg-normalize sounds like an easier candidate for us to integrate, with no extra installation step for instance administrators, implementing your algorithm. Could you help us choose its parameters, and provide example fixtures on which to reproduce the error/fix?

I am not sure if ffmpeg-normalize will be suitable - from what I know it's used to alter an audio stream during conversion, isn't it? I think it'd be best to implement this like YouTube did - only analyzing the audio levels and applying the normalisation attenuation (if needed) during playback, by "offsetting" the volume slider. Applying it by re-encoding the audio stream is probably easier to implement, but it's destructive.

Oh then it should be much simpler. Forget ffmpeg-normalize, if we just need to analyze things and store these metadata, we only need bare ffmpeg, which bundles a few audio filters like loudnorm:

ffmpeg -i <video> -af loudnorm=I=-16:TP=-1.0:LRA=7.0:print_format=json -vn -sn -dn -f null - 2>&1 | tail -n 12 yields for instance:

{
    "input_i" : "-21.15",
    "input_tp" : "-0.54",
    "input_lra" : "18.40",
    "input_thresh" : "-32.44",
    "output_i" : "-16.65",
    "output_tp" : "-1.00",
    "output_lra" : "9.30",
    "output_thresh" : "-26.98",
    "normalization_type" : "dynamic",
    "target_offset" : "0.65"
}

Now the question is, how do we act on the player based on this information?

EDIT: I've looked at how others platforms do it and it seems the output_i - input_i difference is used to modify the base volume of a video (unrelated to the user volume, which doesn't change).

unfa commented 3 years ago

I am not sure how to tag this information, but it seems that some normalization is bring performed regardless. It doesn't seem like just analysis to me.

I think I've found a good resource for this. Or seems EBU recommends -16 LUFS for streaming (instead of -14 YouTube uses). http://k.ylo.ph/2016/04/04/loudnorm.html

rigelk commented 3 years ago

@unfa I'm positive this is just analysis, as a second pass is required with the returned parameters to perform normalization.

seniorm0ment commented 3 years ago

Personally I'd be against this if it wasn't an option, I do not want loudness normalization. Any uploaders can choose to apply normalization to their video in their production software, and as for watching videos as a user, a simple option (emphasis on option) to enable or disable loudness normalization across videos would be preferred. Having it forced on the viewers is not desired.

rigelk commented 3 years ago

I think https://productionadvice.co.uk/stats-for-nerds/ details how YT does it, and gives context, if anyone is interested.

@grravity since the correction will be done client-side only, users will be given the possibility to disable it. Could you elaborate why you don't want loudness normalization though? It seems quite handy to counter loudness wars.

unfa commented 3 years ago

I agree having an option client-side to disable it will be nice. But I think it should be on by default. It simply lends itself towards a better, more consistent viewing experience with much less unpleasant volume changes. I believe music streaming services allow users to disable loudness normalization, but it's on by default.

I'd also love to know why you're not a fan of this feature, @grravity.

fenarinarsa commented 3 years ago

Loudness normalization has been standard in the broadcast industry for a few years now - for instance in France every TV show must be normalized to -23 LUFS. I think it initially came from the UK - as usual.

However the norm is more complex than a single value. For instance, a dialog part in a movie can have a quieter sound. Also, it's not always good to normalize pure music, for instance a classical concert in many part, in which one part is softer.

The recommended value for internet media is indeed -16 LUFS.

I guess YouTube enforced -14 LUFS to avoid "jump scare" videos etc, while keeping a tolerance.

Also, it means that if you upload anything initially mixed for broadcast, you need to add 7 dB. It is also usually better to normalize quieter audio tracks, in most of case people just don't know what they're doing. Video captures can be too quiet, music clips can be incorrectly recorded, etc. It's actually the case really often.

But it would be best to enable this feature by default, and allow it to be disabled by the uploader. Some people already know what they're doing (sound engineers, video editors etc) audio-wise when they upload medias.

As a matter of fact, I normalize everything I upload to -16 LUFS in Media Encoder, but that's an additional thing to think of :D especially when uploading broadcast archives.