darshan-hpc / darshan

Darshan I/O characterization tool
Other
56 stars 27 forks source link

PyDarshan job summary cleanup #910

Closed shanedsnyder closed 1 year ago

shanedsnyder commented 1 year ago

From an in-person review of the current Darshan job summary reports at Argonne, it would be nice to try to address as much of this as we can before doing a release.

First some high-level things up for discussion:

Smaller things:

tylerjereddy commented 1 year ago

There is some duplication with i.e., gh-804 here as well.

shanedsnyder commented 1 year ago

I embarrassingly can't figure this one out:

Erroneous "type" row in File Count Summary table

Seems like it ought to be a one-liner to move the "type" index up to a column like other values, but I can't quite seem to pull it off.

Most of the other issues I've already addressed in #907

tylerjereddy commented 1 year ago

It wasn't an error--it was an intentional index label. You can remove it, but then you'll need to shim the test suite as well (which is pretty good to do even for aesthetic changes anyway). This is pretty straightforward, but it wasn't a bug (it was intentional) so I'd prefer not using this approach:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -71,7 +71,6 @@ def log_file_count_summary_table(derived_metrics,
                              "read/write files":3}
     df = pd.DataFrame.from_dict(darshan_file_category, orient="index")
     df.rename(columns={0:"index"}, inplace=True)
-    df.index.rename('type', inplace=True)
     df["number of files"] = np.zeros(4, dtype=int)
     df["avg. size"] = np.zeros(4, dtype=str)
     df["max size"] = np.zeros(4, dtype=str)
diff --git a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
index 080264d1..3d23fb76 100644
--- a/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
+++ b/darshan-util/pydarshan/darshan/tests/test_lib_accum.py
@@ -203,7 +203,6 @@ def test_file_count_summary_table(log_name,
                          "read-only files",
                          "write-only files",
                          "read/write files"]
-    expected_df.index.rename('type', inplace=True)

     log_path = get_log_path(log_name)
     with darshan.DarshanReport(log_path, read_all=True) as report:

image

I'd prefer if you didn't mutate the data structures for the sole purpose of removing a row in the report, and instead do something like this (a small test may be warranted) so that end users get what they want from to_html but no need to inconvenience developers who like index labels:

--- a/darshan-util/pydarshan/darshan/lib/accum.py
+++ b/darshan-util/pydarshan/darshan/lib/accum.py
@@ -100,5 +100,6 @@ def log_file_count_summary_table(derived_metrics,
     df.drop(columns="index", inplace=True)
     ret = plot_common_access_table.DarshanReportTable(df,
                                                       col_space=200,
-                                                      justify="center")
+                                                      justify="center",
+                                                      index_names=False)
     return ret
shanedsnyder commented 1 year ago

Last bits of this closed via #927.