TopEFT / topeft

15 stars 24 forks source link

HistEft to scikit hist #371

Closed btovar closed 1 year ago

btovar commented 1 year ago

Main changes are in the histEFT.py file. API changes:

All but two unit tests are working:

If you have any comments I can fix those, but probably the two failing tests need someone that understand the related physics.

btovar commented 1 year ago

Regarding rebining, currently hist does not support it with variable edges. Thus, there is no easy way to update this line:

https://github.com/TopEFT/topcoffea/blob/501fd5d655783c7862d09bd42f40e2b841acbd0c/topcoffea/modules/datacard_tools.py#L480C1-L480C81

    edge_arr = self.BINNING[km_dist] + [h.axis(km_dist).edges()[-1]]
    h = h.rebin(km_dist,Bin(km_dist,h.axis(km_dist).label,edge_arr))

As @kmohrman pointed out, it should be possible to create the histogram with the correct edges to begin with. I see the edges are created here:

https://github.com/TopEFT/topcoffea/blob/501fd5d655783c7862d09bd42f40e2b841acbd0c/analysis/topEFT/topeft.py#L58

Could I just create those axes with the edge values from the datacard_tools.py module?

klannon commented 1 year ago

@bryates @Andrew42 @sscruz Please see the comment from @btovar above. For some context, it is critically important that we migrate away from coffea histograms to the new scikit-hep based hist class. Unfortunately, this means at least temporarily migrating away from variable sized rebinning. I think it's critical that a one or more actual users looks at this and comes up with a solution. The solution proposed by @btovar and @kmohrman above would certainly work and would have the added advantage of reducing memory usage while filling the histograms, which is often the computational bottleneck. If you agree that this is a viable solution, can you either help @btovar implement this change in the analysis code or identify someone else who could help?

bryates commented 1 year ago

@bryates @Andrew42 @sscruz Please see the comment from @btovar above. For some context, it is critically important that we migrate away from coffea histograms to the new scikit-hep based hist class. Unfortunately, this means at least temporarily migrating away from variable sized rebinning. I think it's critical that a one or more actual users looks at this and comes up with a solution. The solution proposed by @btovar and @kmohrman above would certainly work and would have the added advantage of reducing memory usage while filling the histograms, which is often the computational bottleneck. If you agree that this is a viable solution, can you either help @btovar implement this change in the analysis code or identify someone else who could help?

@Andrew42 since we eventually hand the histogram off to uproot to make the root files, could we use another container that supports variable binning? Maybe we could do this by having the datacard maker convert to np.histogram first, and then do the rebinning??

btovar commented 1 year ago

See #384.