csteinmetz1 / pyloudnorm

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
https://www.christiansteinmetz.com/projects-blog/pyloudnorm
MIT License
621 stars 55 forks source link

Process hangs, out of memory? #40

Open rawrbw opened 2 years ago

rawrbw commented 2 years ago

Hi!

I've been trying out pyloudnorm and it works like a charm except I am having issues with larger WAV files. The bigger the wav file, the more memory is consumed by the python process, and if the file (on my MacBook with 8GB ram) is big enough the system becomes sluggish, the process says it's not responding and memory consumption is near maxed out. Usually the program succeeds after some time but the system is useless in the meantime. Is there any way you can help with this? It would be nice if there was some limit on memory usage so the system can do other things.

The system becomes sluggish specifically during measuring loudness and applying loudness normalisation. (Those steps in the code) Writing and reading the wav file are fine for system.

csteinmetz1 commented 2 years ago

Hi @rawrbw, thanks for opening this issue.

What you describe makes sense for very large files. I am curious to know how large your files are, both in duration and size.

One solution is to simply cut the file into smaller chunks once you load the file, and then process each chunk with pyloudnorm one-by-one averaging the measurements together. It would be possible to implement this directly in pyloudnorm, but it would likely be best to leave this up to the user.

In my experience, the majority of use cases operate on files from a few seconds to a few minutes, which I believe should not cause any memory consumption issues. However, with files that are say over 1 hr long I could see this becoming an issue.

Here is basic example to show the chunking method. However, in a few tests I did find that this will not give the exact same results as measuring loudness on the whole file directly, but errors were ~0.1 LUFS, which is not perceptually significant. Using a larger chunk size will get these results closer but require more memory.

import numpy as np
import soundfile as sf
import pyloudnorm as pyln

data, rate = sf.read("/path/to/long_file.wav")  # load audio (with shape (samples, channels))
meter = pyln.Meter(rate)  # create BS.1770 meter

chunk_lufs = 0.0
chunk_size_s = 240 # 240 second chunks
chunk_size_samp = int(chunk_size_s * rate)
# Split signal into non-overlapping chunks
N = int(data.shape[0] / chunk_size_samp) + 1

for n in range(N):
    start_idx = n * chunk_size_samp
    stop_idx = start_idx + chunk_size_samp
    chunk = data[start_idx:stop_idx,:]
    loudness = meter.integrated_loudness(chunk)  # measure loudness
    chunk_lufs += loudness * (chunk.shape[0]/data.shape[0])

overall_lufs = meter.integrated_loudness(data)

print(f"Overall: {overall_lufs}")
print(f"Chunk: {chunk_lufs}")

It should be possible to handle this chunking internally in pyloudnorm so that memory cost is constant, but this would require re-writing the core algorithm, which is likely not a priority at the moment. If others raise this issue further we may consider this implementation. However, for the moment I recommend using this chunking method and see if that will work for your use case.

rawrbw commented 2 years ago

Thank you for this in-depth answer! Also thanks for sharing the code for what you suggest about chunks, I will try it out and see if it stays within allowed tolerance of the loudness spec for broadcast etc. This is usually +/- .2 LU. The files I was testing where the system would freeze during analyzing and processing are:

20min.wav - 345MB 30min.wav - 518MB 45min.wav - 791MB 1h 9min.wav - 1.2GB

10 min files would hang every so often but not noticable when using computer. All files 24Bit 48khz Stereo

csteinmetz1 commented 2 years ago

Thanks for these numbers. Certainly makes sense that you would hit this issue with files this large. I agree that this should be solved internally in how we compute the loudness, but we likely won't have it implemented soon as this could require a significant rework. Thanks for raising this issue.

rawrbw commented 2 years ago

Yes that would be amazing of course! But meanwhile I have solved this using the chunking technique outlined by you. I also had to chunk the big files when processing the lufs, and wrote the individual chunks as wavs that later where joined to the final file. All this meant that the overall stress on the machine now is much better and the processing on big files is much faster completing.

iver56 commented 1 year ago

One of the reasons why pyloudnorm uses a lot of memory is that it creates a float64 copy of the audio internally for filtering, due to the way scipy works. In case anyone is looking for an optimized (in terms of speed and memory) implementation for large audio files, this open source C++ lib made by one of my colleagues may be a viable option: https://github.com/nomonosound/libloudness