desbma / r128gain

Fast audio loudness scanner & tagger (ReplayGain v2 / R128)
GNU Lesser General Public License v2.1
172 stars 9 forks source link

Peak tag doesn't conform to RG or ITU standards #4

Closed graue closed 5 years ago

graue commented 5 years ago

First, thanks for this awesome tool.

I was surprised to see r128gain's peak tags show values higher than I expected, with some above 1.0 for lossless files, which seemed impossible. After doing some reading, I understand that this is the "true peak," or an estimation of where the peak would be after digital to analog conversion.

But in the ReplayGain 2.0 spec, peak amplitude reflects sample peak:

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

The ITU BS.1770-4 spec does advocate using true peak, but in decibels:

Meters that follow these guidelines, and that use an oversampled sampling rate of at least 192 kHz, should indicate the result in the units of dB TP, having converted the result to a logarithmic scale. This can be achieved by calculating “20log[10]” of the attenuated, oversampled, filtered, absolute value, then adding 12.04dB. The “dB TP” Designation signifies decibels relative to 100% full scale, true-peak measurement.

So instead of replaygain_album_peak=1.25892541, we would expect something like replaygain_album_peak=2.0 dB TP.

To be honest, the more I think about this, the more your behavior seems like the best combination of backwards compatibility and usefulness, but it does seem surprising and technically violate RGv2. Have you considered raising this with the people drafting the RGv2 spec and documenting the divergence in the meantime?

Thanks again for the software.

desbma commented 5 years ago

The peak is already displayed by r128gain in dBFS: File 'a.ogg': loudness = -25.8 dbFS, peak = -9.3 dbFS.

Now regarding the tag format itself, this is not something we can easily change, as many players have code to parse the current format. Backward compatibility is a big issue, and the format is identical to the RGv1 tag.

Is is common to have files with a true peak slightly above 1.0 however 1.25 seems unusual, can you post a sample file?

graue commented 5 years ago

Now regarding the tag format itself, this is not something we can easily change, as many players have code to parse the current format. Backward compatibility is a big issue, and the format is identical to the RGv1 tag.

This is why I'm suggesting to just document the current behavior so it isn't a surprise, or maybe edit the spec since it's a wiki. What do you think?

The highest true peak I've seen so far is +3.6dB or ~1.5, which occurs between 3:30 and 3:45 in the track "In Your Heart (Vincent Clarke Remix)" by A Place to Bury Strangers.

snippet_with_high_true_peak.zip

desbma commented 5 years ago

I would first start a discussion in the HydrogenAudio forums before editing the wiki.

Honestly I won't loose sleep over that, because:

That file is interesting though, it may be triggering a bug in the FFmpeg code. Depending on how it is run, the replaygain filter reports the peak as 1.513202 or 1.000000.

I have open an issue to be sure: https://trac.ffmpeg.org/ticket/7812 EDIT: and another one: https://trac.ffmpeg.org/ticket/7813

graue commented 5 years ago

I have a couple other lossless tracks with peaks of +3.2 and +3.4dB. Let me know if those are of interest. The common thread seems to be a lot of very high frequency energy.

desbma commented 5 years ago

So it turns out FFmpeg internally does both sample rate and sample format conversion before the R128 loudness analysis. This can lead to very confusing values like sample peak above 0 dBFS. This made r128gain sometimes return incorrect true peak values.

This is now fixed, and to retain compatibility with RGv1, and respect the RGv2 specification, I have also changed it to only use the sample peak, not the true peak, so it should now always be <=1.0.

All of this will be released in version 0.8.0 (coming soon).

graue commented 5 years ago

Great, thank you!

Will sample peak be measured before FFmpeg's sample rate conversion?

If it's measured after sample rate conversion, as seems to be suggested here, it'll still be inaccurate. A peak of 0.98852539 could become 1.00000000 from the 44.1 KHz to 48 KHz conversion.

desbma commented 5 years ago

If it's measured after sample rate conversion, as seems to be suggested here, it'll still be inaccurate. A peak of 0.98852539 could become 1.00000000 from the 44.1 KHz to 48 KHz conversion.

You are absolutely right. This is what the code does now, but I will fix it before releasing version 0.8.0.

desbma commented 5 years ago

That last issue should also be fixed now, the peak is calculated either before the R128 analysis (for single tracks), or in a separate filter chain (for album peak with multiple tracks).

Version 0.8.0 should be published on PyPI in a few hours.