Closed douglasdavis closed 4 years ago
Hi Doug, this one of the many features in the upcoming boost-histogram Python package. It is a fast library as well, faster than plain numpy, but slower than fast-histogram. I am optimistic that we can achieve the same speed eventually.
Hi Hans, yes I've been keeping an eye on the boost{::,-}histogram developments :)
Hi, @douglasdavis
This is a useful feature, however, is it more beneficial here to use C instead of Python? Numpy multiply does essentially the same thing but you also add additional operations like, for example, weight shift when required.
Hi @vkhodygo, I'm not exactly sure what you mean. It's much faster to accumulate the sum of weights squared (the variance) on the C/C++-side (because the variance in each bin of a weighted histogram is dependent on the logic that determines the bin). When I was thinking about this problem (quite a while ago) I wrote a blog post about how to do it in pure python -- that is a pretty slow solution (compared to a C implementation). Maybe there's a better pure NumPy way that I did not (and still do not) know about at the time!
But anyway- this PR is very old and other solutions exist. I ended up writing a Python library called pygram11 to do these calculations. I still maintain it for fun, while the Scikit-HEP community (as mentioned above by Hans) has been developing a much more full-featured histogramming library, boost-histogram, which also includes a storage type that tracks the variance in a bin. Given those solutions I'll go ahead and finally close this PR.
Hi,
I'd be interested in returning the sum of weights squared for histograms. I took a stab at implementing the 1D case. If this is something you're not interested in, no hard feelings; if you think the project has a place for this, I'd be happy to implement unit tests and a 2D version.
Thanks, Doug