create MaterializedLayers because we're passing dicts to HighLevelGraph.from_collections, which are then immediately turned into MaterializedLayers. MaterializedLayers cause optimization to scale with the number of input partitions, which is untenable for realistic dataset chunk multiplicities.
Long story short:
https://github.com/dask-contrib/dask-histogram/blob/main/src/dask_histogram/core.py#L728-L766 and https://github.com/dask-contrib/dask-histogram/blob/main/src/dask_histogram/core.py#L1019-L1049
create MaterializedLayers because we're passing dicts to
HighLevelGraph.from_collections
, which are then immediately turned into MaterializedLayers. MaterializedLayers cause optimization to scale with the number of input partitions, which is untenable for realistic dataset chunk multiplicities.I have tried to use https://github.com/dask/dask/blob/main/dask/layers.py#L1090 (DataFrameTreeReduction) but that's resulted in some fairly odd outcomes that I don't quite understand.
This must be fixed before we even begin to think about telling people they should use this package for HEP analysis.