darshan-hpc / darshan

Darshan I/O characterization tool
Other
57 stars 28 forks source link

updated derived metrics interface #898

Closed shanedsnyder closed 1 year ago

shanedsnyder commented 1 year ago

Mostly for discussion for now, see commits for more details.

One thing I'd like to cleanup is add some sort of helper logic for mapping between module names and their C library module identifiers (integers). It's annoying to maintain that in Python, but otherwise we don't really have anyway to use the accumulator API unless we have an open Darshan log we can use to find the underlying module identifier. For now, the Python accumulator interface is ugly since it requires you to give the module name and ID.

shanedsnyder commented 1 year ago

I just pushed one more commit that adds some helper for mapping between module name to index and uses this logic internally in log_get_derived_metrics() so that callers don't have to ensure they hold onto both module names and log format indices. Really, we should probably not require users to know what a module index is and have them interact entirely with module names for simplicity.

I think I'm satisfied with the state of things here and am willing to merge things in assuming there are no other critical concerns to address? I'm not sure if it makes any difference for attribution purposes, but I could cherry-pick these commits back onto your original PR @tylerjereddy if it matters any to you? Otherwise, I just requested a review here and we can merge this in directly and close #839

tylerjereddy commented 1 year ago

The last two commits look fine to me.

I suppose we could debate about using a dictionary to retrieve the mod_idx to get O(1) vs. the O(n) search from list.index(), but the sequence is so small, and unlikely to balloon, that it isn't worth the effort either way.

Feel free to merge this PR if that is easiest, and if you want to squash the commits or whatever I don't mind.