cms-nanoAOD / correctionlib

A generic correction library
https://cms-nanoaod.github.io/correctionlib/
BSD 3-Clause "New" or "Revised" License
16 stars 22 forks source link

feat: add dask_awkward wrapper to Correction and CompoundCorrection #219

Closed lgray closed 10 months ago

lgray commented 10 months ago

Also add awkward wrapper to CompoundCorrection.

This PR lets us pass dask_awkward.Array into correctionlib corrections. It does the wrapping of the correction into a delayed object and map_partitions call internally now.

evaluate = sf.evaluate(
    dx,
    1.0,
)

Is significantly cleaner than the map_partitions version.

lgray commented 10 months ago

@nsmith- please review when you have time, thanks!

lgray commented 10 months ago

and just to be sure you're fine with the dask.delayed object getting cached (this was to enforce re-use on multiple calls to wrap the correction, otherwise it keeps generating a new key in the graph / more payload).

Mostly just an issue of thread safety, but I don't imagine people using correctionlib in python threads (as opposed to processes) that much.

nsmith- commented 10 months ago

I'm much more scared of attempting to persist the dask.delayed object in the library code. Just having it wrapped is fine I think.

lgray commented 10 months ago

Yeah what I've implemented here was more or less what Martin suggested so far as dask usage patterns are concerned. No need to persist if you wrap it in the delayed object. It'll be handled by any scheduler that conforms to the spec. This is also what's being done over in coffea for corrections and ml models after his suggestion.