asteroid-team / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
MIT License
963 stars 88 forks source link

Feature Request: Support for Target Level Normalization #176

Open LarocheC opened 7 months ago

LarocheC commented 7 months ago

Feature Description

I propose the addition of an target level normalization augmentation to the torch-audiomentations library. This feature would analyze an audio signal's average power and adjust its gain to normalize the signal to a target RMS level.

Current Limitations

While the library currently supports peak normalization and gain adjustments, these methods do not directly allow setting a specific RMS across audio signals. Gain adjustments without explicit RMS targeting can result in inaccurate loudness levels especially those with large dynamic ranges.

Potential Implementation

An RMS normalization feature would provide a more accurate way to achieve a set level across different audio signals. Considering the function already available in the library. It would be similar to the gain augmentation. The actual gain would be a function of the signal current level and of the target RMS value.

Considerations for Preventing Clipping

When setting high levels, clipping can occur if the signal's peak amplitude exceeds the system's maximum representable limit, leading to distortion. To prevent this issue, we could take into consideration the crest factor for different types of signals.

I believe this would enhance the library's utility for audio processing and production tasks. Thank you for considering this proposal. I look forward to discussing it further and am open to contributing to its implementation.

iver56 commented 7 months ago

Thanks for the feature request