Using pyln in streaming mode

kwanUm commented 3 years ago

Hi,

I'm training a neural network that regenerates user speech from several features extracted from the source speech. One of those features is a loudness. I'm looking for a fast library to extract this feature in real-time for small chunks of data (say 40ms), and use this features to regenerate the audio.

I'm wondering if anyone here did something similar and has insights about this topic.

csteinmetz1 commented 3 years ago

Hi @kwanUm,

There is an inherent limitation with attempting to measure loudness on short time scales like ~40ms. By default, the BS.1770 standard that pyloudnorm follows requires a window of at least 400ms in order to measure the loudness for a single frame. Nevertheless, this prediction of loudness with a single frame is likely not very accurate without considering a running average over a number of frames (like the integrated loudness).

Partly for this reason, pyloudnorm doesn't currently support streaming mode explicitly. Although, we are currently working on implementing the momentary and short-term loudness metrics, which would be more relevant to your application. although not exactly address them. If you need to operate on such small time scales, using something like the RMS energy may be applicable, but I am not sure of the details of your use case.

kwanUm commented 3 years ago

Thanks for the fast reply!

If that helps, I can move on the data with a sliding window that jumps 40ms at a time, but also includes an historical context of my choice (say 360ms). I'm guessing that even in this case, several other calculations need to be done to smooth the results.

I'm working with human speech data btw.

Please keep me up to date when you include the streaming mode feature in pyln!

On Wed, Nov 11, 2020 at 6:50 PM Christian Steinmetz < notifications@github.com> wrote:

Hi @kwanUm https://github.com/kwanUm,

There is an inherent limitation with attempting to measure loudness on short time scales like ~40ms. By default, the BS.1770 standard that pyloudnorm follows requires a window of at least 400ms in order to measure the loudness for a single frame. Nevertheless, this prediction of loudness with a single frame is likely not very accurate without considering a running average over a number of frames (like the integrated loudness).

Partly for this reason, pyloudnorm doesn't currently support streaming mode explicitly. Although, we are currently working on implementing the momentary and short-term loudness metrics, which would be more relevant to your application. although not exactly address them. If you need to operate on such small time scales, using something like the RMS energy may be applicable, but I am not sure of the details of your use case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/csteinmetz1/pyloudnorm/issues/32#issuecomment-725534153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5EBDEML7U7YJG3QKBH6MDSPK6DPANCNFSM4TSGX5KQ .

happyTonakai commented 7 months ago

Hi, did you find any solution? I'm interested in real time loudness normalization as well.

csteinmetz1 / pyloudnorm

Using pyln in streaming mode #32