feat: when adding dask_histgram.boost.Histograms delay creation of task graph and use multi-source tree reduction

dask-contrib / dask-histogram

Histograms with task scheduling.

https://dask-histogram.readthedocs.io

BSD 3-Clause "New" or "Revised" License

23 stars 4 forks source link

feat: when adding dask_histgram.boost.Histograms delay creation of task graph and use multi-source tree reduction #126

Closed lgray closed 8 months ago

lgray commented 9 months ago

This is definitely a prototype - but outlines a nice pattern allows the interface to scale a littler further.

This results in significantly simpler graphs for single histograms with lots of fills (i.e. over systematic uncertainty categories). Now, instead of multiple layers of tree reduces for boost histograms there is just one that aggregates over all hist-on-block operations that are generated on each histogram fill. i.e. This can handle a tree reduce over multiple input collections.

Memory use is a little bit less. Graph is pleasantly more clean.

before: starting_graph

after: with_new_agg

It may not look like a big diff with a smaller graph but it becomes very apparent as you increase the number of fills.

~~This PR also now implements multi-fill syntax.~~ With a more concise implementation this appears to not be necessary.

lgray commented 9 months ago

@martindurant in any case between this one and #125 we have some things that improve the situation when filling lots of histograms.

I have a feeling this + #125 (the multi-fill part) will address most of the issues analysis users are running into. Though clearly we can think about what's the best way to compose it all together.

This alteration definite improves the structure of the resulting task graph, but it doesn't mitigate time to construct many thousands of variations. The multi-fill thing in the other draft PR takes care of that rather handily, though, and results in rather snappy performance.

Getting the interface reasonable may be a challenge. If we can figure out how to make multiple fills turn into something like the multi-fill interface behind the scenes that would also work quite well.

lgray commented 9 months ago

Example of multi-fill syntax:

axes_fill_info_dict = {
    dense_axis_name : dense_variables_array_with_cuts["lep_chan_lst"][sr_cat][dense_axis_name],
    "weight"        : tuple(masked_weights),
    "process"       : histAxisName,
    "category"      : sr_cat,
    "systematic"    : tuple(wgt_var_lst),
}
hout[dense_axis_name].fill(**axes_fill_info_dict)

Here showing a fill where we pass multiple weights corresponding to systematic variations.

lgray commented 9 months ago

After talking to some users it seems another way to do this that's ~reasonable is to pass a function that specifies exactly the filling we would like to do, and we can pass a list of the arguments to each fill call.

martindurant commented 9 months ago

After talking to some users it seems another way to do this that's ~reasonable is to pass a function that specifies exactly the filling we would like to do, and we can pass a list of the arguments to each fill call.

Of course you can do that, but it's circumventing our public API. Fine if this is really the exception... but dealing with the repeated fill (or repeated with_field) seems like it might be worthwhile.

lgray commented 9 months ago

Yeah there's a few patterns that come out of this. The problem is that a typical high energy physics analysis will use all of those patterns, and if the task graph is most efficiently made with the fewest total fill calls, then you need the leak in the API to do all those patterns at once.

It is a bit chicken and egg.

martindurant commented 9 months ago

if the task graph is most efficiently made with the fewest total fill calls

if... the question is, which is easier: dealing with forcing users into doing their own batching, or making our own batching mechanism? I haven't reviewed the code in this PR yet, but the fact that it already exists means that the latter might be the case. Are there other cases of repeatedly applying the same operation that this pattern would apply to? Or other cases where "bury these in a single map_partitions" is the best current advice we can give?

lgray commented 9 months ago

The other-other choice is to write an optimization layer that fuses the histogram fills after the fact into something reasonable (and thus assembles the batches fills). but then you pay in time calling awkward functions to do all the fills....

I think the pattern to search for is actually straightforward.

martindurant commented 9 months ago

write an optimization layer that fuses the histogram fills

This doesn't help with graph build time or memory use during build.

lgray commented 9 months ago

We need to dig into why you're not seeing the memory improvements I see with this PR. I think it might be macos.

But yeah it won't remove the time-domain part of the problem unless the awkward array broadcast_and_apply time is improved.

martindurant commented 9 months ago

it won't remove the time-domain part

Batching with_field calls should, but not hists

lgray commented 9 months ago

Lemme give a simple with_fields impl a try. I need a break from other things.

lgray commented 9 months ago

@martindurant this one is largely well-focused now. Please give it a try. It looks like with this we can avoid or do a much better job with batching the histogram fills together.

lgray commented 9 months ago

@martindurant is there anything we should add to this one. I think it's a clear improvement on what's there and doesn't rock the boat too much.

lgray commented 9 months ago

It also paves the way for pooling the fills when we generate the task graph (which yet needs study).

lgray commented 9 months ago

@martindurant here's the real first try at multifill. It works for dask-awkward arrays.

Here's a recipe that should work to see the stalling issues and recursion error: https://gist.github.com/lgray/8a28c5fcd707a2a6778f92cd598f0ca6