LLNL / hatchet

Graph-indexed Pandas DataFrames for analyzing hierarchical performance data
https://llnl-hatchet.readthedocs.io
MIT License
27 stars 18 forks source link

Support Printing Tree for MultiIndex Columns #120

Closed michaelmckinsey1 closed 5 months ago

michaelmckinsey1 commented 5 months ago

Summary

MultiIndex columns do not occur natively in Hatchet, however Thicket has some cases where the columns in a Thicket.statsframe.dataframe will become MultiIndex. This is relevant since the Thicket.statsframe is a Hatchet.GraphFrame, so to support printing the Thicket.statsframe.tree(), there must be a separate check for MultiIndex columns in Hatchet. This should not impact the functionality of Hatchet users using GraphFrame.tree(), since the GraphFrame.dataframe wont be MultiIndex if only using Hatchet by itself.

Current Issue

The current code dataframe.loc[df_index, self.name] returns a series when the columns are MultiIndex, which looks unsightly in the tree printout. We can avoid this by slightly tweaking the format to index directly to the string value.

image

Why hasn't this been an issue before?

Thicket.tree() prints on the Thicket.statsframe.dataframe aswell, but it handles this case appropriately by checking for the MultiIndex. So the current Thicket.tree() function can be used to avoid this problem. However, once https://github.com/LLNL/thicket/pull/118 is merged, the Thicket.tree() will print on the Thicket.dataframe instead of the statsframe. Therefore, this functionality is necessary to continue to support both Thicket.tree() and Thicket.statsframe.tree().