darshan-hpc / darshan

Darshan I/O characterization tool
Other
55 stars 27 forks source link

BUG: never disable "shared reductions" for heatmap mod #942

Closed shanedsnyder closed 11 months ago

shanedsnyder commented 1 year ago

The shared reduction callback for the heatmap module is used to reach a consistent heatmap format across all ranks at shutdown time, which our analysis tools expect.

Furthermore, a "shared reduction" doesn't really apply to heatmap data, as it's already captured on every process (as opposed to a shared record on rank 0) -- ignoring the disable shared reduction flag is not an issue for the heatmap module, making this an easy fix.

Fixes #941

shanedsnyder commented 1 year ago

I didn't add any testing yet, but this might warrant a regression test on our CI before merging.

shanedsnyder commented 1 year ago

We should also look at just resolving the inconsistent heatmaps this bug produces in the darshan-util code. It may be a simple fix and provide a way to resolve the problem for any affected logs.

shanedsnyder commented 11 months ago

I'm actually going to punt on a regression test for this. We now have a runtime library fix for this bug and a util library fix to correct older logs that exhibit the bug -- this error condition won't be easy to trip and with 2 layers of fixes, I'll just call it a day for now.

I'll go ahead and merge this.