Open edisj opened 3 years ago
Do we have a way to examine the h5py.File
object and know that it has a driver and/or a comm set? If so, then we could do what you suggest and only do a seek to the beginning instead of close/open if we know that we don't have enough information to reopen in the same way.
We almost certainly also need to think about this in the context of @yuxuanzhuang 's picklable/serializable readers, see PR #2723 . (At the moment I don't know if it will be necessary to serialize a reader that's using an MPI communicator already. Normally we would launch multiple copies of the same script with mpirun
and we would not require a serialization mechanism unless we want to mix, say, Dask with MPI with dask-mpi)
That's a good idea. I'll be able to check once I play around with mpi4py. I've managed to build parallel hdf5 and have parallel h5py and mpi4py installed on the workstation. Just trying to copy over my branch's mdanalysis so it should be up and running soon.
So from what I can find so far, there's a couple ways to see if the file has been opened with parallel drivers -
The nice way is to do something like
f = h5py.File('filename.h5md', 'r')
f.driver
which will spit out 'mpio' if the file was opened with the parallel driver. I think this is a convenient way to check. I think all files opened with h5py.File
have a driver attribute. Here's a list
The other way is to do f.atomic
which raises an error if the driver isn't 'mpio' (I'm not sure how it works with other drivers though). But in any case I don't think we'd use that to check
I think just checking f.driver
should work nicely. What do you think?
We will want to check if one can serialize a parallel H5MDReader , ie if MPI.Comm
can be serialized. This will determine how we can use parallel reading. See also #2890.
Is your feature request related to a problem?
To use [h5py's parallel features](), you need to pass
driver
andcomm
arguments when you open a file, like this:We'd like to add the ability to use these arguments with the
H5MDReader
(see PR#2787), but there are some methods (below) that could be a problem due to the stream not being reopened in the same way withdriver
andcomm
.Updated: Following up on issue #2890, to pickle
H5MD
files opened withdriver="mpio"
andcomm=MPI.COMM_WORLD
, we need a way to store theMPI.Comm
object used to open the file.Describe the solution you'd like
I would pull the keyword arguments,
driver
andcomm
, out of the mda.Universe arguments and store them asself.driver
andself.comm
prior to this line: https://github.com/MDAnalysis/mdanalysis/blob/618f7647d3c6c4657945416490191d87adc10fe5/package/MDAnalysis/coordinates/H5MD.py#L316Then, perhaps
_reopen
andopen_trajectory
could perform checks to see if the arguments are passed and instead of closing the file, it rewinds it to the first frame.Updated: Store
comm
as an argument in an__init()__
method when callingH5PYPicklable
, and use some sort of functions that can pickle theMPI.Comm
object similar to https://bitbucket.org/mpi4py/mpi4py/issues/104/pickling-of-mpi-commDescribe alternatives you've considered
Can't think of any other way at the moment
Additional context
H5MD format pyh5md package H5PY documentation
EDIT: Updated issue text after issue #2890