darshan-hpc / darshan

Darshan I/O characterization tool
Other
56 stars 27 forks source link

BLD, BUG: can't use LD_PRELOAD with Intel or GNU builds of libdarshan when POSIX is disabled #771

Open tylerjereddy opened 2 years ago

tylerjereddy commented 2 years ago

On the LANL snow supercomputer with CC=mpicc, where mpicc is from an Intel OR GNU compiler toolchain with a recent openmpi (i.e., ..openmpi/4.1.1-intel-19.0.4/bin/mpicc), I can successfully configure and build/install the darshan-runtime library, but when I profile the application I get the error shown at the bottom of this ticket when POSIX monitoring is disabled at config time.

Roughly, working at hash 5a30254c8:

Then during monitoring of the application with LD_PRELOAD approach I see: symbol lookup error: /path/to/darshan_install/lib/libdarshan.so: undefined symbol: __real_fileno

If I repeat the above steps in a clean checkout, but with the POSIX monitoring turned back on, things work just fine again.

carns commented 2 years ago

This looks like an undocumented dependency between the stdio and posix modules. fileno() is one of those weird functions that spans both modules because the point of the function is to map between the two. The wrapper lives in the posix module because we count it as a posix open (subsequent functions could use the fd it produces, just like they might use an fd produced by open()).

I'm not sure this is easily fixable; it might be better to just enforce that the posix module cannot be disabled while the stdio module is enabled.

tylerjereddy commented 2 years ago

Note that similar issues appear to exist with other permutations, including POSIX-only enabled via:


- and then when I test the app via `mpirun ..` with darshan `LD_PRELOAD` as usual:

/path/to/libdarshan.so: undefined symbol: __real_vfprintf

tylerjereddy commented 2 years ago

Even if we argue that POSIX and STDIO are too tightly coupled, and enable both of them while disabling the rest, I still end up with: libdarshan.so: undefined symbol: dxt_posix_runtime_initialize

So it seems to me that option-orthogonality isn't particularly reliable right now--if you want to debug by "bisecting" on the configure-time enabled/disabled modules, there's a pretty high probability you'll end up encountering coupling issues between modules.

tylerjereddy commented 2 years ago

This may warrant a mention in the runtime docs perhaps.

carns commented 2 years ago

Ugh, Ok. There are probably some light fixes to make (that dxt thing should be fixable) and then we maybe should make it so that neither stdio nor posix can actually be disabled.