loudness normalization using EBU-r128

oplatek commented 2 years ago

Hi!

Have you considered adding EBU-r128 normalization?

E.g. similar to the implementation below which however needs ffmpeg as a dependency? https://github.com/slhck/ffmpeg-normalize#ebu-r128-normalization

csteinmetz1 commented 1 year ago

Hi @oplatek,

EBU R128 uses BS.1770 as the algorithm for normalization. Using pyloudnorm should produce very similar results to ffmpeg.

Did you have a specific use case in mind? Currently pyloudnorm only measures integrated loudness but EBU R128 also includes short-term and momentary loudness. Was that what you were referring to?

oplatek commented 1 year ago

My use-case is comparing relatively short Text-to-Speech (TTS) or Voice Converted (VC) samples converted between source speaker & condition to clean target speaker voice.

The samples are typically 2-14s long, with length normally distributed. I noticed that RMS is sensitive to background noise e.g. for VC from noisy conditions to clean target conditions. And as I want to compare side by side noisy and clean utterances I want them to be normalized to the same perceived loudness.

In general, I think that peak loudness normalization is the best. I asked about EBU R 128 normalization because some other studies used it and it also uses peak normalization.

csteinmetz1 / pyloudnorm

loudness normalization using EBU-r128 #39