Serial-ATA / lofty-rs

Audio metadata library
Apache License 2.0
185 stars 34 forks source link

Memory leak inside `lofty::Probe::read()` #210

Closed hinto-janai closed 1 year ago

hinto-janai commented 1 year ago

Reproducer

use std::fs::File;
use std::io::BufReader;
use walkdir::{WalkDir, DirEntry};
use std::path::PathBuf;

// Create PATHs.
let paths: Vec<PathBuf> = WalkDir::new("/your/music/path")
    .into_iter()
    .filter_map(Result::ok)
    .map(DirEntry::into_path)
    .collect();

// Get thread count.
let threads = std::thread::available_parallelism().unwrap().get();

// Enter thread scope.
std::thread::scope(|scope| {
    // This leak "magnification" occurs the more threads get spawned.
    for paths in paths.chunks(paths.len() / threads) {
        scope.spawn(|| {
            for path in paths.iter() {
                // Open path.
                let file = File::open(path).unwrap();
                let reader = BufReader::new(file);

                // Create options.
                let options = lofty::ParseOptions::new().parsing_mode(lofty::ParsingMode::Relaxed);

                // Create probe.
                let probe = lofty::Probe::new(reader).options(options);

                if let Ok(p) = probe.guess_file_type() {
                    // This leaks a TON of memory.
                    // Commenting this line out stops the leaking.
                    p.read();
                }
            }
        });
    }
}); // Everything allocated above should be freed at this point.

// A TON of memory is still being used here.
println!("done");
std::thread::park();

Summary

lofty::Probe::read() leaks memory. I've tested the mp3 and flac .read() so far.

It's not actually too bad, around 10,000 files .read() leaks around 8MB~ for me, but what is weird is that these leaks compound to an insane amount when doing the exact same operations, but chunked across scoped thread(s).

Expected behavior

Memory should be freed after the probe and tagged file are dropped.

Serial-ATA commented 1 year ago

Is this only in 0.14.0?

hinto-janai commented 1 year ago

No, same behavior starting from at least 0.9.0 (that's as far back as the example will compile).

I fixed the thread scope/chunk example, should work now.

Serial-ATA commented 1 year ago

I'm not able to replicate this.

I tested with 100, 500, and 1000 files and the only allocations that remain are the static ItemKey mappings. There is pretty high memory usage during execution as expected.

Are you able to produce a leak using the example above?

hinto-janai commented 1 year ago

Yes. This is the same as the example above but with std::thread::park() at the end.

lofty

Resident memory usage is 811MB, even after we exit scope. Both Windows/macOS have the same behavior as well.

This isn't cache that can be reused/given away either. After running the example in a loop, it'll continue to grow until the OS just kills the program when we run out of physical memory.

hinto-janai commented 1 year ago

Ok. I'm not so sure this actually has to do with lofty::Probe::read() anymore, although it does trigger it.

This seems to happen with symphonia's probe as well and really anything that allocates memory by reading a file, like:

let mut vec = vec![0_u8; 1_000_000];
file.read(&mut vec);
drop(vec);

This still leaks memory if inside scope + .chunks().

I can reproduce on multiple systems so I don't think I'm going crazy but I don't think this is a bug in lofty so, closing for now.

Serial-ATA commented 1 year ago

@hinto-janai Looking into it, this is working as intended:

https://stackoverflow.com/questions/45538993/why-dont-memory-allocators-actively-return-freed-memory-to-the-os/45539066#45539066

Lofty (and Symphonia) make a lot of allocations for any single file, so the memory isn't going to be returned to the OS, it'll remain available for any future allocations. I guess the issue here is since you're reading so many files at once, they're stepping on each other's toes and the memory can't be reused. It'll just go on to request more and more memory with each file.

The only thing that could be done on my end is to reduce the number of allocations somehow, but it will all end up at the same point given enough files.

hinto-janai commented 1 year ago

In these situations, the memory is freed at the end of scope:

So I'm not sure what's going on here. Something to do with threads and .chunks() maybe?

If it is the allocator not giving back, I'm not sure what triggers it. Allocators should be freeing cache if it needs to allocate new things under low memory conditions, but none of the objects created are new, so... I don't know.