aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

Memory leak in group creation #137

Closed jzrake closed 3 years ago

jzrake commented 3 years ago

The following is a reproducer for what seems like a fatal memory leak.

Rust crate version: 0.7.1 HDF5 Version: 1.12.0 OS: MacOS, Linux

Memory usage grows linearly from 10MB to 2GB over 100,000 iterations

fn write_test_file(fname: &str, iteration: usize, num_groups: usize) {
    println!("iteration {}", iteration);
    let h5f = hdf5::File::create(fname).unwrap();
    for j in 0..num_groups {
        h5f.create_group(&format!("group-{}", j)).unwrap();
    }
}

fn main() {
    for i in 0..100000 {
        write_test_file("test.h5", i, 100)
    }
}

Memory usage is stable at 15MB

#include <stdlib.h>
#include <hdf5.h>

void write_test_file(const char* fname, int iteration, int num_groups)
{
    printf("iteration %d\n", iteration);
    hid_t h5f = H5Fcreate(fname, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    for (int j = 0; j < num_groups; ++j)
    {
        char gname[1024];
        snprintf(gname, 1024, "group-%d", j);
        hid_t h5g = H5Gcreate(h5f, gname, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
        H5Gclose(h5g);
    }
    H5Fclose(h5f);
}

int main()
{
    for (int i = 0; i < 100000; ++i)
    {
        write_test_file("test.h5", i, 100);
    }
    return 0;
}
mulimoen commented 3 years ago

Could be a duplicate of #76

aldanor commented 3 years ago

Yea on the top of it, it seems like the same thing, probably should be worth looking into again...

jzrake commented 3 years ago

76 seems to deal with some complex edge cases. If this crate has issues with multi-threaded access to property creation lists or something, I would just not do stuff like that. But the example above is very straightforward, and was revealed in an equally straightforward application which crashes on 8GB nodes after ~hours of runtime.

mulimoen commented 3 years ago

I tested the program with our Registry removed. Running this did not give an increase in memory used. Running valgrind show the only leak is from the lazy_statics.

aldanor commented 3 years ago

As discussed in #76, if you simply remove Registry, you'll definitely cause troubles on HDF5 1.8.* since they changed reference counting behaviour in 1.10 (IDs were reused often which would cause frequent clashes, now they are reused less frequently so you have to try hard to make it clash but it's still possible).

It's still a bit weird that over 100k iterations it grows to 2GB. E.g., new group handles should go out of scope and get dropped and dec-refed back to zero? Any plists that are created and accessed in the process should be dropped and decrefed as well? Maybe it's worth printing out contents of registry after these iterations to see how many IDs does it hold and what do those IDs represent? (Maybe they are all plists and we're missing a decref somewhere) I think this would be a more useful approach than simply dropping Registry.

mulimoen commented 3 years ago

It works out to about 200 bytes per iteration (100 * 100k iterations), with two property lists created every iteration. The registry is quite wasteful, it uses 8 (hid_t) + 8 (Arc) + 8 (Rwlock) + 8 (hid_t) bytes for every entry. The RwLock can be removed using the global mutex.

Printing the registry results in a bunch of -1s. Do you intend for invalid entries to be removed from the registry?

aldanor commented 3 years ago

I wonder, as soon as we invalidate a certain hid (set it to -1 across all objects sharing it and update the value in the hashmap), can we just simply remove it from the hashmap, what will that cause? (The Arc<RwLock>> will still exist shared across those objects, it's just the key will no longer be present in the hashmap)