PyAV-Org / PyAV

Pythonic bindings for FFmpeg's libraries.
https://pyav.basswood-io.com/
BSD 3-Clause "New" or "Revised" License
2.39k stars 354 forks source link

`Resampler` object still occupies memory after deletion #1429

Open MahmoudAshraf97 opened 3 weeks ago

MahmoudAshraf97 commented 3 weeks ago

Overview

I have a class that accepts audio inputs and resamples them as needed, the resampler parameters are different for each call so I can't create the resampler as a class member, anyways, after executing the function the resampler still leaves residuals in memory even if deleted

Expected behavior

memory usage should stay constant without having to run GC

Actual behavior

memory usage increases if GC is not run manually even if resampler object is deleted

Investigation

I ran the reproduction code to test three different cases:

Baseline: image

delete=True, collect=False image

collect=True image

Reproduction

import psutil
import gc
import av
import matplotlib.pyplot as plt

audio_path = "/mnt/e/Projects/whisper-diarization/096.mp3"
process = psutil.Process()

def minimal_example(audio_path, delete=False):
    resampler = av.audio.resampler.AudioResampler(
        format="s16",
        layout="mono",
        rate=16000,
    )

    with av.open(audio_path, mode="r", metadata_errors="ignore") as container:
        frames = container.decode(audio=0)
        for frame in frames:
            frame = resampler.resample(frame)

    if delete:
        resampler = None
        del resampler

def monitor_memory(audio, n=20, collect=False, delete=False):
    gc.collect()
    init_memory_usage = process.memory_info().rss
    memory_usage = []
    for _ in range(n):
        minimal_example(audio, delete=delete)
        if collect:
            gc.collect()
        memory_usage.append(
            (process.memory_info().rss - init_memory_usage) / 1000000
        )  # Store memory usage in MB

    print("")
    gc.collect()
    # Plotting the memory usage
    plt.plot(memory_usage)
    plt.title("Memory Usage Over Time")
    plt.xlabel("Iteration")
    plt.ylabel("Memory Usage (MB)")
    plt.show()

Versions

Research

I have done the following:

Additional context

https://github.com/SYSTRAN/faster-whisper/issues/390 https://github.com/SYSTRAN/faster-whisper/pull/856/

moonsikpark commented 3 weeks ago

This is because av.audio.resampler.AudioResampler uses av.filter.graph.Graph, and obviously graphs require a circular reference which would create an object that is not deletable by traversing acyclic reference graphs. Because of it, while the AudioResampler gets deallocated when it gets out of scope, Graph does not.

It is not a bug, because it does not affect the program's correctness, and the author has relied on cpython's implementation of the cyclic garbage collector to clean up the resources. It would be a performance enhancement if the reference loop is eliminated.

moonsikpark commented 3 weeks ago

@MahmoudAshraf97, please test again using https://github.com/PyAV-Org/PyAV/pull/1439/commits/851ff21b4dd607468bc544c6222980de17fdee01. It will not solve the issue completely, but it could reduce some of the memory footprint.

MahmoudAshraf97 commented 3 weeks ago

This is the result: image

I guess the problem still exists, and since repeating the same experiment many times doesn't guarantee exact reproduction, I cant verify whether this is partially solved or not

moonsikpark commented 3 weeks ago

Using large enough iteration will help to make the result somewhat deterministic.

Snipaste_2024-06-26_01-14-44 In my machine (arm64 M2 Pro Darwin 14.4.1), with delete=False, collect=False, n=200;

Without applying the patch (left) I can see the gc periodically cleans the circular referenced objects and there seems to be some objects that cannot be recovered by gc.

After applying the patch (right) the memory graph is steadily increasing and no signs of visible gc activity. The graph could indicate a real leak though I haven't investigated it further. Overall the memory footprint seems to be significantly smaller.

@MahmoudAshraf97, can you try my patch again with delete=False, collect=False, n=200 and compare it with v12.1.0 or main?

WyattBlue commented 3 weeks ago

@moonsikpark The Graph object probably has a memory leak, and it certainly has it's own circular references.