Closed Akninirith closed 10 years ago
Thanks for finding this!
It looks like HLLUtil#largeEstimator is returning NaN
, which then turns into 0 when it gets cast to a long
. The argument going into largeEstimator
seems sane at first glance, but I'll have to take a deeper look later this week.
It doesn't look like this was a regression introduced in any of the recent changes to the estimator code. I checked out de5cc2dcef88603386ca9b77e1f041ad213cc874 and it also has the same issue.
We hit a long
overflow because computing 2L for L > 63 isn't possible with Java's 63 bit signed integer. To see this, look here and note that for regWidth = 6, log2m = 13
we have:
(((1 << regWidth) - 1 - 1) + log2m) = 64 - 2 + 13 = 75
and 275 is greater than Long.MAX_VALUE
.
I believe we can safely move TWO_TO_L
to be a double[]
since all calculations that use it end up converting it to a double anyway.
I'll push a new jar to Sonnatype in a bit.
Sorry for the delay, but it should be in the central repo within a few hours.
There seems to be some kind of a bug with HLL execution; when one instantiates an HLL as having a regwidth of 6, the cardinality returned is consistently 0 at large sample sizes. The following test was constructed and run in FullHLLTest. Could this be something to do with cutoff measurements? Thanks!