addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 556 forks source link

HLL becomes incorrect if cardinality is over billions #107

Closed liyang-gmt8 closed 7 years ago

liyang-gmt8 commented 8 years ago
@Override
public long cardinality() {
    double registerSum = 0;
    int count = registerSet.count;
    double zeros = 0.0;
    for (int j = 0; j < registerSet.count; j++) {
        int val = registerSet.get(j);
        registerSum += 1.0 / (1 << val);        // SHOULD BE -- 1L << val
        if (val == 0) {
            zeros++;
        }
    }
szhem commented 7 years ago

@liyang-gmt8, I'm also interested in this issue, but unfortunately cannot reproduce it with the following example

public static void main(String[] args) {
    HyperLogLogPlus hllp = new HyperLogLogPlus(20, 25)

    long size = 10_000_000_000L
    for (long i = 0; i < size; i++) {
        hllp.offer(i)
    }

    System.out.printf("True Cardinality: %s%n", size)
    System.out.printf("Estimated Cardinality: %s%n", hllp.cardinality())
    System.out.printf("Estimated Error: %f%n", Math.abs((hllp.cardinality() - size) / (float) size))    
}

... and got the following results

True Cardinality: 10000000000
Estimated Cardinality: 10010661180
Estimated Error: 0.001066

Could you please help to find a sample input to reproduce this issue?

abramsm commented 7 years ago

closing. please re-open if you can provide a failing test case.