darshan-hpc / darshan

Darshan I/O characterization tool
Other
57 stars 28 forks source link

BUG: fixes for HDF5 module method for determining whether MPIO VFD is used #961

Closed shanedsnyder closed 1 year ago

shanedsnyder commented 1 year ago

Fixes #960

For some background, we added the weak symbol hack mentioned above in our last release to work around a weird issue when LD_PRELOADing Darshan with HDF5 support leading to loader issues like this:

symbol lookup error: /home/shane/software/darshan/darshan-dev/install/lib/libdarshan.so: undefined symbol: H5FD_mpio_init

While this hack works as intended for non-HDF5 apps, it does not work as intended for HDF5 apps -- the weak dummy function is being linked in rather than the real implementation of H5FD_mpio_init, causing errors as seen in #960.

Rather than go down the rabbit hole to understand why our usage of H5FD_MPIO is causing the HDF5 library to need to be linked (this wasn't the case in HDF5 versions prior to 1.13), I've just moved away from using that macro entirely. H5Pget_fapl_mpio can provide the same functionality, with the caveat that we have to temporarily disable HDF5 error printing when probing, as it will print an error message if called on a file access property list that does not have the MPIO VFD enabled.

shanedsnyder commented 1 year ago

IIRC, there's not a way to force the the currently failing CI test (which is using the Cirrus service to test Apple M1) to re-run without jumping through extra hoops.

The warning messages that led to the error there seem to have been resolved in PR #959 which is now merged. In any case, the changes here are obviously unrelated to the Python warnings for our log analysis code. Given that, I'll go ahead and force the merge through despite the failed test.