Open tylerjereddy opened 1 year ago
Good catch. In cases where we have to up-convert old module records, we use -1
to denote that a particular counter wasn't instrumented properly (in this case, this log was generated with Darshan's old PnetCDF module before we adopted much more detailed instrumentation, so most of the counters in the new PnetCDF module aren't supported). We need to make sure we zero those back out before plotting.
I'll try to remind myself how we fix this for other modules and apply a similar fix. Probably the most useful regression test to add would be to ensure that no op count plots have negative values, that's true across modules universally.
In #907, I've updated op count and access histogram plots to always have a ylim of 0, so we won't have to worry about the problem, at least visually, going forward.
As a final refinement, I think we could try to apply the following strategy (pulled from old issue #569):
Add checks such that counters are only included in figure if they were enabled at runtime. It looks like MMap is the only one that is disabled by default, but I'll have to double-check
That approach is nice in that it allows us to distinguish between operations that truly were never called (op count of 0) vs those that Darshan could not instrument at runtime (e.g., POSIX_MMAPS
if mmap instrumentation is disabled).
In https://github.com/darshan-hpc/darshan/pull/907, I've updated op count and access histogram plots to always have a ylim of 0, so we won't have to worry about the problem, at least visually, going forward
I don't see any such change in gh-907.
For the rest of it, I think I like the idea of enforcing a bottom-out of 0
in the data structure, or using a genuine missing value sentinel instead of some kind of reduction on -1
s or whatever is going on there. I'd need to think about how much the data structure in question gets used though.
Oh sorry, I accidentally linked the wrong issue. I meant #910.
The sentinel should probably be pd.NA
eventually, or whatever they are using these days, assuming that nan
is not being used because this is an integer type.
On latest
main
, you can get negative operation counts (see below) with this log repo file summay report:python -m darshan summary e3sm_io_heatmap_only.darshan
. This probably needs investigation/regression test + fix.