Closed 1uc closed 10 months ago
Alternatively we could inject not just options but an entire Hdf5Reader
which would have (the next closest thing to) a polymorphic template:
template<class T>
virtual std::vector<T> readSelection(const HighFive::DataSet& dset, const Selection& selection) const = 0;
and a polymorphic function for opening the HDF5 file.
The rules inside libsonata
would be to collectively call Hdf5Reader::readSelection
. However, inside the method each party is free to implement whatever they see fit, including short-circuiting logic for empty edge cases (which breaks collective semantics). In a different reader we can implement a variant that's compatible with MPI-IO. Likewise we can inject alternative readers that perform aggregation for GPFS without MPI-IO. While retaining an implementation which doesn't perform any aggregation to reduce the number of bytes read, e.g. for SSDs.
This PR consists of two parts, the first introduces new API which would allow developers to inject HDF5 properties (via File Access and Data Transfer Property Lists) for reading. This could be used to select a page buffer for paged HDF5 files, tweaking meta data cache sizes, etc.
It can also be used to inject collective MPI-IO into
libsonata
without making it dependent on MPI. This is sketched out in 0a770925fd3df8b0c1e61ac3795311c4230c0d9c. The entirety of that commit would be moved to a separate repositorylibsonata-mpi
; and there would be little to no mention of MPI inlibsonata
.The way this works is that there's two object
FileAccessOpts
andDataTransferOpts
, each contains ashared_ptr
to a polymorphic*Impl
class. The polymorphic classes have a method:(respectively). Which will set the desired properties in the respective Property List.
While we can avoid the dependency on MPI, we do require changes to
libsonata
that make it compatible with MPI-IO. One of them is sketeched out incollective::_readSelection
. With the new aggregated reading, we can also make?fferent_edges
collective.There's a difficulty in packaging. We require that
mpi4py
is a build-time dependency. The issue is that build-time dependencies are both static and mandatory. It is easy to implement a version oflibsonata-mpi
which doesn't have MPI support. This allows user code which expects MPI-IO works without changes even iflibsonata-mpi
doesn't have MPI-IO support, by simply not performing MPI-IO. The difficulty is that we can't write apyproject.toml
orsetup.py
forlibsonata-mpi
that can build either the MPI-enable or MPI-disabled version by simply passing flags or detectingmpi4py
. What can be done is break build isolation and install all requirements manually, then installlibsonata-mpi
depending on whethermpi4py
is present, it'll build with or without MPI support. Using Spack we can work around this limitation.