Open shanedsnyder opened 3 years ago
This has a dependency on https://github.com/darshan-hpc/darshan/issues/642 to generalize some of the aggregate calculations in the darshan-util library.
This is blocked on what is partially described in #155 -- we need to maintain slowest rank read, write, and meta time separately for shared files to support this fully. For now, we can only provide a "worst case" cost for all I/O components (read, write, meta) combined.
In GitLab by @carns on Mar 8, 2021, 13:36
The current darshan-job-summary pdf includes a graph that looks like the attachment . It shows the average I/O cost per process, which means summing up read, write, and metadata time across all ranks and dividing by the number of ranks.
This is helpful in many cases, but it fails to highlight cases where the application is held back by one rank (or a few ranks) spending more time in I/O than the others. The worst case would be a job that does "rank 0 I/O" (i.e., all I/O from a single process of the job). On average this would look fine in this graph if there are enough ranks not doing I/O to amortize the cost.
We should add an additional graph that at least shows the worst case rank in terms of I/O percentage in addition to the average percentage.