Closed atrettin closed 3 years ago
Thank you for the PR!! I will review it shortly.
Alright, this is looking much better now! I've added the weighted version, added more input checks and automatic conversions and also added histogramdd
to the unit tests. It gives verifiably the same results as histogram1d
and histogram2d
.
Alright, now we also have tests for up to 10D against numpy.histogramdd
. I think that should be all we need. :)
Hey @astrofrog, it would be great if you could have a look at this, I hope that everything makes sense and I'd be happy to implement changes if need be.
Sorry for the delay in reviewing! I am only working part time at the moment so I am a bit slower than usual but I should get to it this week.
Done! :)
Thanks! I'll merge this and will try and do a release soon.
Thank you for merging! Just now looking at this I found an inconsistency between this package and numpy: When a data point is exactly at the upper bound of the largest bin, numpy includes it into the largest bin count, while we don't do that here. Furthermore, I found that in my own code this causes it to write one index farther into the count array, potentially risking a segfault... 😬 Sorry I just literally now realize that. I'll write a unit test demonstrating it and bring a fix, please hold on with the release until then!
Hi all,
I happen to have a (scientific) application where I need fast histograms, and I found this package very promising to be exactly what I need! If it would only support histograms in arbitrary dimensions, it would be perfect... and I think that I did in fact manage to pull it off. :)
This PR adds the function
histogramdd
to the package.It is a straight generalization of the 1D and 2D code, only that the input sample is given as a tuple. The code discovers automatically how many arrays there are in the tuple and generates the output dimensions accordingly. If the dimension is
D
and the number of samplesN
, then the tuple must containD
arrays withN
elements. Bins and ranges are passed as arrays as well with shapes(D,)
and(D, 2)
, respectively to pass the number of bins as well as the ranges for each dimension.The output of
histogram2d(x, y, range, bins)
should be the same ashistogramdd((x, y), range, bins)
. The speed of the generalized function is a bit slower, so it is still beneficial to use the dedicated 1D and 2D functions where appropriate.I'm also working on the weighted version but I thought I'd make a PR to get some feedback. This is still missing tests and documentation, and maybe someone with more C and numpy API experience could have a look if there is any possibility to make things a little bit faster still to match the performance of the 1D and 2D code.