darshan-hpc / darshan

Darshan I/O characterization tool
Other
55 stars 27 forks source link

ENH: wrap _exit in Darshan's shared library #978

Closed shanedsnyder closed 3 months ago

shanedsnyder commented 3 months ago

Python apps using the multiprocessing package can invoke the _exit() call directly. E.g., see this code.

Invoking _exit() directly bypasses Darshan's shared library destructor, which is what is used to write out its log file. So, these processes spawned by multiprocessing are properly instrumenting I/O calls, but never get an opportunity to write the instrumentation data to log file.

This PR adds a function wrapper for _exit(), which Darshan uses to intercept _exit() calls, invoking the Darshan shutdown procedure before calling the real _exit implementation. This modification to Darshan seems to fix some issues reported by users related to this issue (e.g., #872). Let's see if it can get through our CI without causing other issues.

Note that this code is only included in our shared library, since it's intended for Darshan's non-MPI mode (which only works when dynamically linking).

Keeping this PR as a WIP at least until I add some autoconf support to disable this functionality entirely, in case it becomes problematic in some scenarios.

shanedsnyder commented 3 months ago

I'm going to go ahead and merge this then -- the only failing test is an LDMS thing that's unrelated to changes here.