aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 84 forks source link

Call to `H5Pset_evict_on_close` cannot be prevented and panics in parallel builds #259

Closed Tehforsch closed 9 months ago

Tehforsch commented 1 year ago

I was experimenting with parallel hdf5 by using the mpio feature gate. Now I wasn't exactly sure about how to do this but came up with the following program:

use hdf5::{
    file::LibraryVersion,
    plist::{self},
    FileBuilder,
};
use mpi::traits::{AsRaw, Communicator};

const NUM_ELEMENTS: usize = 10000000;
const FILE_NAME: &str = "out.hdf5";
const DATASET_NAME: &str = "data";

fn main() {
    let universe = mpi::initialize().unwrap();
    let rank = universe.world().rank() as usize;
    let num_ranks = universe.world().size() as usize;

    let comm = universe.world();
    let fapl = plist::FileAccess::build()
        .mpio(comm.as_raw(), None)
        .libver_bounds(LibraryVersion::V18, LibraryVersion::V110)
        .finish()
        .unwrap();
    let fcpl = plist::FileCreate::build().finish().unwrap();
    let f = FileBuilder::new()
        .set_access_plist(&fapl)
        .unwrap()
        .set_create_plist(&fcpl)
        .unwrap()
        .create(FILE_NAME)
        .unwrap();
    let dataset = f
        .new_dataset::<f64>()
        .shape(&[NUM_ELEMENTS * num_ranks])
        .create(DATASET_NAME)
        .unwrap();
    let values: Vec<_> = (0..NUM_ELEMENTS).map(|_| rank as f64).collect();
    dataset
        .write_slice(&values, rank * NUM_ELEMENTS..(rank + 1) * (NUM_ELEMENTS))
        .unwrap();
}

If I compile this using hdf5-openmpi and run it (with the current master branch on commit 26046fb), I get H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5 during the call to FileBuilder::create.

I initially thought I would just call FileAccess::build().evict_on_close(false) ... for the file access list, but this doesn't work because the FileAccessBuilder internally stores Some(false) instead of None, and H5Pset_evict_on_close is called either way in populate_plist due to the following code:

if let Some(v) = self.evict_on_close {
    h5try!(H5Pset_evict_on_close(id, hbool_t::from(v)));
}

The FileAccessBuilder is initially created with evict_on_close: None but during FileAccessBuilder::from_plist, evict_on_close is set to Some(...):

            builder.evict_on_close(plist.get_evict_on_close()?);

I didn't see any way to internally set evict_on_close to None before file creation, so that I always end up with this panic.

Now I managed to get this program running by using a local version of hdf5 in which I commented out the line above. Is there any way that this could be fixed officially?

mulimoen commented 1 year ago

It seems one can not call H5Pset_evict_on_close even when the argument is false. It is set from the plist, which returns Some(false). I think we need to add a check if the plist has a different eviction policy from the one copied from and only change iff different. I will make a PR this evening for this