BlueBrain / libsonata

A python and C++ interface to the SONATA format
https://libsonata.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
11 stars 12 forks source link

API for configuring HDF5 properties. #305

Closed 1uc closed 10 months ago

1uc commented 10 months ago

This PR consists of two parts, the first introduces new API which would allow developers to inject HDF5 properties (via File Access and Data Transfer Property Lists) for reading. This could be used to select a page buffer for paged HDF5 files, tweaking meta data cache sizes, etc.

It can also be used to inject collective MPI-IO into libsonata without making it dependent on MPI. This is sketched out in 0a770925fd3df8b0c1e61ac3795311c4230c0d9c. The entirety of that commit would be moved to a separate repository libsonata-mpi; and there would be little to no mention of MPI in libsonata.

The way this works is that there's two object FileAccessOpts and DataTransferOpts, each contains a shared_ptr to a polymorphic *Impl class. The polymorphic classes have a method:

void apply(HighFive::FileAccessProps& fapl);
void apply(HighFive::DataTransferProps& dxpl);

(respectively). Which will set the desired properties in the respective Property List.

While we can avoid the dependency on MPI, we do require changes to libsonata that make it compatible with MPI-IO. One of them is sketeched out in collective::_readSelection. With the new aggregated reading, we can also make ?fferent_edges collective.

There's a difficulty in packaging. We require that mpi4py is a build-time dependency. The issue is that build-time dependencies are both static and mandatory. It is easy to implement a version of libsonata-mpi which doesn't have MPI support. This allows user code which expects MPI-IO works without changes even if libsonata-mpi doesn't have MPI-IO support, by simply not performing MPI-IO. The difficulty is that we can't write a pyproject.toml or setup.py for libsonata-mpi that can build either the MPI-enable or MPI-disabled version by simply passing flags or detecting mpi4py. What can be done is break build isolation and install all requirements manually, then install libsonata-mpi depending on whether mpi4py is present, it'll build with or without MPI support. Using Spack we can work around this limitation.

1uc commented 10 months ago

Alternatively we could inject not just options but an entire Hdf5Reader which would have (the next closest thing to) a polymorphic template:

template<class T>
virtual std::vector<T> readSelection(const HighFive::DataSet& dset, const Selection& selection) const = 0;

and a polymorphic function for opening the HDF5 file.

The rules inside libsonata would be to collectively call Hdf5Reader::readSelection. However, inside the method each party is free to implement whatever they see fit, including short-circuiting logic for empty edge cases (which breaks collective semantics). In a different reader we can implement a variant that's compatible with MPI-IO. Likewise we can inject alternative readers that perform aggregation for GPFS without MPI-IO. While retaining an implementation which doesn't perform any aggregation to reduce the number of bytes read, e.g. for SSDs.