Division support for weighted storages

henryiii commented 2 years ago

Weighted storages do not have a division operator. @lgray requested this for doing a background estimate.

lgray commented 2 years ago

I guess this can be a little difficult when it comes to errors and sumw. I would suggest implementing the simplest rule within boost-histogram and if people need clopper-pearson or something else for division of correlated quantities they can manipulate as needed.

HDembinski commented 2 years ago

Agreed. Division generally assumes the data sets are independent. If you compute an efficiency, the data sets are correlated. We cannot know what has to be done in this case so we can only implement the rule for the simplest assumption, independent data sets.

The best way to handle computing efficiencies with Boost Histogram is to use a special efficiency accumulator that still needs to be written, where you pass a boolean to indicate whether the event passed or not.

lgray commented 2 years ago

Yes - the latter thing you mention would be a really nice interface!

HDembinski commented 2 years ago

Yes - the latter thing you mention would be a really nice interface!

@lgray If you are interested in drafting such an accumulator, I am happy to review it.

See https://github.com/boostorg/histogram/issues/178

HDembinski commented 2 years ago

Some additional comments:

The weighted storage only knows about the value and its variance estimate. Confidence intervals like Clopper-Pearson are something different from a variance estimate, so this cannot be supported by division anyway, and Clopper-Pearson is not applicable to sums of weights.
Division by a constant already works correctly.
What we want is the special rule for multiplication / division of a weighted_sum by another weighted_sum, that computes the variance correctly.
Division of a weighted_sum by an ordinary storage (e.g. double) treats the count like a constant and not like a Poisson count with a Poisson variance. This is perhaps not what the user expects, but we have no guarantee that the contents of an ordinary histogram are counts and not something else, and so there is nothing we can do about it. We gave up on that guarantee early on in the design.

boostorg / histogram

Division support for weighted storages #345