PyDarshan job summary cleanup fixes

shanedsnyder commented 1 year ago

See #910 for more details.

This PR currently has fixes for:

Captions are hard to read (light colored text, also italicized)
Add "% runtime" y-axis label to right-side of I/O cost plot
Offset autolabeling text at 45 degrees in access size histogram plot and POSIX access pattern plot as is done in I/O op count graph
Don't autolabel 0 values to further help with cluttering of plots
add a forgotten space to our warning string when a log has no data

tylerjereddy commented 1 year ago

I'll try to take quick look tomorrow--I may try to draft a small testing class to enforce the changes that are easy to roundtrip-check from the produced html, and just label them as based on feedback from Argonne team.

shanedsnyder commented 1 year ago

It was noted in https://github.com/darshan-hpc/darshan/issues/910 that some of the issue related to "Data access by category" plots overlaps with stuff addressed elsewhere (https://github.com/darshan-hpc/darshan/issues/804, https://github.com/darshan-hpc/darshan/pull/812).

I attempted to revisit that existing PR (https://github.com/darshan-hpc/darshan/pull/812), ultimately cherry-picking it in here and building on it. From my commit messsage, here's what else I added:

all fonts changed to 18pt
drop scientific notation on byte values and instead use humanize module to convert to byte strings
slight tweaks to bar chart text labels:
- e.g., "# files written: (val)" -> "files written: val"
- get rid of special handling of 0 value
added explicit padding to left plot margin to accomodate category name annotations
right-justified annotations to the left of the y-axis (so that if annotations are too long we only see the end of them and they don't spill into plot region)
remove all logic for converting bytes to MiB as this is no longer necessary

Any feedback? I haven't made any attempt to modify tests yet, but will do so (building off @tylerjereddy suggestion above) if things look sensible.

Here's how things look if I hack in a longer mount point string:

Previously a longer mount point string like that just spills over into the plot, but not anymore with the right justification. That means you can clip part of the string off (as here), but that seems preferrable than letting spill over or trying to deal with complicated auto-scaling of font size based on string length.

carns commented 1 year ago

I like it.

The only other possibility would be to wrap the path to multiple lines (splitting on arbitrary characters) on the path name; we aren't lacking for vertical space. I don't know if that's possible or not. What you have is a good compromise.

shanedsnyder commented 1 year ago

Note the data access by category stuff is also related to #728 -- I might see if I can clean up items 2&3 from that issue, as well (1 is already taken care of).

shanedsnyder commented 1 year ago

The only other possibility would be to wrap the path to multiple lines (splitting on arbitrary characters) on the path name; we aren't lacking for vertical space. I don't know if that's possible or not. What you have is a good compromise.

I didn't try this, but after browsing around briefly it may be possible using matplotlib text (wrap option). We are using annotate for this currently, but I think we could switch to text without issue (we are actually using it already for the byte and file count strings in this plot).

I'll tinker with it and see how it goes.

shanedsnyder commented 1 year ago

The only other possibility would be to wrap the path to multiple lines (splitting on arbitrary characters) on the path name; we aren't lacking for vertical space. I don't know if that's possible or not. What you have is a good compromise.

I didn't try this, but after browsing around briefly it may be possible using matplotlib text (wrap option). We are using annotate for this currently, but I think we could switch to text without issue (we are actually using it already for the byte and file count strings in this plot).

I'll tinker with it and see how it goes.

Eh, it's a little more complicated. This wrap option seems to only work on text with whitespace, which you won't get in an FS mount point. Not sure we'll have a lot of options for wrapping given that -- I don't see any example solutions looking around.

I think I'm just going to go with what we have for now.

That said, we may want to revisit in the future. I'm imagining what we'd put there for something like DAOS, which has quite long identifiers for pools/containers. It might be easiest to just: a) put the category name text in a dedicated, fixed-size subplot to the left of the 2 data subplots; b) be more aggressive about shortening category strings so they convey information and aren't arbitrarily chopped (e.g., "DAOS cont=...09zb", just showing the last 4 digits of the ID or something like that). We can worry more about that when it becomes an issue, I think.

shanedsnyder commented 1 year ago

I just committed some testing changes (including suggestions above from Tyler).

I think I'd like to stop any work on this PR and merge it in, and address remaining changes from #910 elsewhere (they are a little more involved and might warrant some discussion).

Note there are now some weird testing failures here when simply upgrading pip...

tylerjereddy commented 1 year ago

Ah yeah, they yanked their package from PyPI, which caused quite a stir in the community--hopefully https://github.com/darshan-hpc/darshan/pull/921 fixes it for us.

tylerjereddy commented 1 year ago

The M1 failure probably persists here b/c Cirrus perhaps doesn't auto-merge into main before running the tests. I think we just ignore that then. If the rest of the CI passes I'll just do a quick scan through.

tylerjereddy commented 1 year ago

Ok, I read through the diff one more time, spot checked the HTML report produced with e3sm_io_heatmap_only.darshan, and didn't see any issues. It sounds like the Argonne folks are in favor of most of these aesthetic adjustments so in it goes.

darshan-hpc / darshan

PyDarshan job summary cleanup fixes #911