Open nawtrey opened 2 years ago
Wow, good find. Out of those options, IMHO the gouraud shading looks the most sensible. I would be worried about upping the DPI for general purpose use because it could presumably become a file size problem with enough ranks (and the points may be very tiny visually).
Binning ranks to some maximum y dimension doesn't seem like it would be too bad to me, but TBH I've forgotten the discussion that eliminated it previously. Regardless, gouraud still seems like a reasonable choice to me since it doesn't require any manual bin manipulation on our part.
At some point we are going to lose fidelity no matter what, and if someone wants the details they will need to look at the data hands on. We just need this to be a sensible first cut view in the summary report.
Here are the file sizes of the different figures:
1132275 --- 2000_dpi_numpy_hack.png
935723 --- snyder_acme_DXT_POSIX_2000_dpi_original.png
932812 --- snyder_acme_DXT_MPIIO_2000_dpi_original.png
331491 --- snyder_acme_DXT_POSIX_600_dpi_gouraud.png
304666 --- snyder_acme_DXT_MPIIO_600_dpi_gouraud.png
There is some variance in the file size based on how the bins are populated, so I added 2000_dpi_numpy_hack.png
as a reference (has a value of 1 in every bin). But it is still 1.1 MB, which may not be too controversial.
As far as binning on the y-axis, I don't remember exactly what was said, I just remember not worrying about it after some discussion with the team.
Personally I think as long as we can resolve the bins reasonably we should avoid binning, but if we need to implement a solution for larger nprocs
values we can. Regardless of the path we take, I think we will need to set a threshold for the max DPI we are willing to use (probably based on the resultant file size), because that would determine what our nprocs
limit is before having to implement binning, gouraud, or something else.
Could scale figure size and leave DPI constant, I think html supports frames with scroll bars along both axes if you want to contain the heatmap footprint in a div element of some size. Changing the resolution dynamically seems a bit more confusing than having a constant DPI but rank/time-adjusted dimensions. Hard to say if worth the effort. If someone wants to run the biggest simulation ever and then complain about file size I'm not sure that's the driving design case we need to worry about.
Could also just add prominent/warning message at the threshold of the observable limit along a given axis and move on (i.e., "hey your simulation is huge, you might want to inspect the data manually if finer details are missing in the map due to size/resolution limits").
I like @tylerjereddy 's idea of adding an error message, I think it would be simple enough to write a function that checks for this and adds a flag to the report.
I tried this really quick:
diff --git a/darshan-util/pydarshan/darshan/experimental/plots/plot_dxt_heatmap.py b/darshan-util/pydarshan/darshan/experimental/plots/plot_dxt_heatmap.py
index 9407f8f..8eb3bec 100644
--- a/darshan-util/pydarshan/darshan/experimental/plots/plot_dxt_heatmap.py
+++ b/darshan-util/pydarshan/darshan/experimental/plots/plot_dxt_heatmap.py
@@ -289,6 +289,28 @@ def adjust_for_colorbar(jointgrid: Any, fig_right: float, cbar_x0: float):
)
+def get_ax_canvas_height(fig, ax):
+ # get the height of the plot canvas
+ ax_canvas_height = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted()).height
+ return ax_canvas_height
+
+
+def check_fig_dpi(fig, ax, nprocs):
+ ax_canvas_height = get_ax_canvas_height(fig=fig, ax=ax)
+ # calculate maximum number of ybins that can be resolved
+ max_ybins = int(np.floor(ax_canvas_height * fig.dpi))
+ # calculate number of ybins required to resolve nprocs ybins
+ required_ybins = int(np.ceil(nprocs/ax_canvas_height))
+
+ if nprocs > max_ybins:
+ warn_msg = (
+ "Too many MPI processes to resolve in DXT heatmap figure. \n"
+ f"Figure DPI is {fig.dpi} which supports nprocs <= {max_ybins} \n"
+ f"With {nprocs} processes, this figure requires dpi >= {required_ybins}"
+ )
+ print(warn_msg)
+
+
def plot_heatmap(
report: darshan.DarshanReport,
mod: str = "DXT_POSIX",
@@ -345,6 +367,7 @@ def plot_heatmap(
# build the joint plot with marginal histograms
jgrid = sns.jointplot(kind="hist", bins=[xbins, nprocs], space=0.05)
+ jgrid.fig.set_dpi(300)
# clear the x and y axis marginal graphs
jgrid.ax_marg_x.cla()
jgrid.ax_marg_y.cla()
@@ -427,6 +450,8 @@ def plot_heatmap(
jgrid.ax_joint.set_xlabel("Time (s)")
jgrid.ax_joint.set_ylabel("Rank")
+ check_fig_dpi(fig=jgrid.fig, ax=jgrid.ax_joint, nprocs=nprocs)
+
plt.close()
return jgrid
Here is the output for snyder_acme.exe_id1253318_9-27-24239-1515303144625770178_2.darshan
:
Too many MPI processes to resolve in DXT heatmap figure.
Figure DPI is 300 which supports nprocs <= 842
With 8192 processes, this figure requires dpi >= 2916
Of course this just prints a message at the moment, but it could be leveraged in a flag, raise a proper warning at run time, etc.. Also the plot canvas should be ~3" tall which means we should get a recommended dpi of 2731, so this isn't quite right, but it's a starting point..
I like the warning idea too. I think you could make it simpler for the purposes of the summary report tool. Maybe just a footnote or something added to the caption that says "Warning: sparse I/O access from individual ranks in jobs with more than 512 processes may not be visible at this resolution."
Maybe a command line option could be provided at some point that lets people bump the resolution in whatever way seems to make sense. If so, then that option could be suggested after the warning.
I still don't like automatically bumping resolution unless it is asked for. The problem isn't an individual user running this tool, but chances are good that somehow somewhere it will get included in an automated pipeline and inadvertently produce more data than expected. We have had automated systems that produced legacy darshan-job-summary.pl reports before that hit this problem.
I still don't like automatically bumping resolution unless it is asked for. The problem isn't an individual user running this tool, but chances are good that somehow somewhere it will get included in an automated pipeline and inadvertently produce more data than expected. We have had automated systems that produced legacy darshan-job-summary.pl reports before that hit this problem.
That's a good point. Yeah that sounds like good enough reason to just add a message somewhere.
I think we can set a constant in the ReportData
constructor to be used when the figures are registered, and later add a command line option that allows users to change it. It could be a scaling factor (default 1) to scale the figure dimensions (with some sort of scroll bar situation for larger figures), or store the DPI (default 300) to be used on select figures with this issue.
We could then add a function for checking the resolution using nprocs
and the figure plotting area height (like in the diff above), check a single DXT heatmap, and if it's triggered a flag gets registered in the report. I think we want to follow the format used by the partial data flags with a simple warning sign and text.
As far as where to put the flag, I think we already have issues with redundant captions so we would want to avoid adding redundant flags too. If 1 DXT heatmap has an error, the others will too. Maybe that encourages the addition of a flag/warning box at the top of the report to place warnings like this, like has been discussed previously. I would probably need to see an example of what we want that to look like before diving in though. I'm guessing a constant sized box with a scrollbar would be good, with a default message like "No warnings/errors to report".
Since this issue focuses on the vertical bars, it is also worth nothing that the horizontal bar issues still persist on pydarshan-devel
after merging gh-622 for i.e., snyder_acme..
file:
Thanks @tylerjereddy. I've updated the issue title to better reflect the more general nature of this issue.
Background
Noted in gh-622, the DXT heatmap figures generated by
plot_heatmap()
still do not reflect the input dataframe. Specifically, if we plotsnyder_acme.exe_id1253318_9-27-24239-1515303144625770178_2.darshan
, we see some instances where the vertical bar graph is non-zero but the heatmap shows no data. I've boxed in the section with the inconsistency:**Note: the following graphs are all generated using branch
nawtrey_issue_575_update_ymax
DXT_POSIX
:DXT_MPIIO
:I checked and the values are certainly in the dataframe, and for 7 ranks there are values of ~5e6, so these should show up orange-red like the values near them. This was checked by saving the dataframe as html and browsing through the values for the
DXT_MPIIO
figure. Here's an archive with the html file: hmap_data.tar.gzIf the input data is fine (and it seems to be), then it appears the heatmap is having issues representing the input data. I think this is because we are trying to shove 8000+ y-axis bins into a 4.5" tall figure. Even if the entire height was taken by the heatmap, at 300 DPI we are only going to see 1350 distinct y-axis bins. So ultimately I think this is really a resolution problem.
Solutions
I think there are 3 ways we can handle this:
nprocs
is greater than it group the ranks together.nprocs
.nprocs
, and add theshading="gouraud"
argument to thesns.heatmap()
. This gets passed to thematplotlib.pyplot.pcolormesh()
function (which is what makes the heatmap), and it will sort of "blend" the bins:I think the team decided against 1. early on, so I haven't spent any time working on that solution.
For 2., here is the
DXT_MPIIO
figure saved withdpi=2000
:We get our data to appear, although it is pretty difficult to see without zooming in when the bins are 5e-4" tall.
For 3., here is the
DXT_MPIIO
figure saved atdpi=600
withshading="gouraud"
set:Here we can see a bit easier, but the horizontal bins are still pretty difficult to see.