addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 556 forks source link

What does merge() in AdaptiveCounting do? #73

Closed medined closed 10 years ago

medined commented 10 years ago

I expected that merging two AdaptiveCounting objects would add the cardinalities but that did not happen. The output of the following was:

A: 1000 B: 1000 C: 1000

I expected that C would have a cardinality of 2000. I realize that I can simply add the two cardinalities (A and B) but I wanted to understand what merge does.

Here is the code that I am using:

    ICardinality estimatorA = new AdaptiveCounting(16);
    for (long i = 0; i < 1000; i++) {
        estimatorA.offer(i);
    }
    System.out.println("A: " + estimatorA.cardinality());

    ICardinality estimatorB = new AdaptiveCounting(16);
    for (long i = 0; i < 1000; i++) {
        estimatorB.offer(i);
    }
    System.out.println("B: " + estimatorB.cardinality());

    ICardinality estimatorC = estimatorA.merge(estimatorB);
    System.out.println("C: " + estimatorC.cardinality());

Thanks.

abramsm commented 10 years ago

David -

The results are what I would expect. You have two estimators, each with the same set of unique elements. So when you merge them it properly detects that no new additional unique elements have been added. If you change your code to this:

ICardinality estimatorA = new AdaptiveCounting(16); for (long i = 0; i < 1000; i++) { estimatorA.offer(i); } System.out.println("A: " + estimatorA.cardinality());

    ICardinality estimatorB = new AdaptiveCounting(16);
    for (long i = 1000; i < 2000; i++) {
        estimatorB.offer(i);
    }
    System.out.println("B: " + estimatorB.cardinality());

    ICardinality estimatorC = estimatorA.merge(estimatorB);
    System.out.println("C: " + estimatorC.cardinality());

Then the results will be more inline with what you originally expected.

Matt

On Tue, May 20, 2014 at 10:47 AM, David Medinets notifications@github.comwrote:

I expected that merging two AdaptiveCounting objects would add the cardinalities but that did not happen. The output of the following was:

A: 1000 B: 1000 C: 1000

I expected that C would have a cardinality of 2000. I realize that I can simply add the two cardinalities (A and B) but I wanted to understand what merge does.

Here is the code that I am using:

ICardinality estimatorA = new AdaptiveCounting(16);
for (long i = 0; i < 1000; i++) {
    estimatorA.offer(i);
}
System.out.println("A: " + estimatorA.cardinality());

ICardinality estimatorB = new AdaptiveCounting(16);
for (long i = 0; i < 1000; i++) {
    estimatorB.offer(i);
}
System.out.println("B: " + estimatorB.cardinality());

ICardinality estimatorC = estimatorA.merge(estimatorB);
System.out.println("C: " + estimatorC.cardinality());

Thanks.

— Reply to this email directly or view it on GitHubhttps://github.com/addthis/stream-lib/issues/73 .

medined commented 10 years ago

Doh! This class has been incredibly easy to integrate into my programs. Thanks!

abramsm commented 10 years ago

Glad you find it useful. AdaptiveCounting is one of our older classes. I recommend you switch to HyperLogLogPlus. It will provide accuracy for low cardinality cases and space efficiency for high cardinality use cases.

matt

On Tue, May 20, 2014 at 11:00 AM, David Medinets notifications@github.comwrote:

Doh! This class has been incredibly easy to integrate into my programs. Thanks!

— Reply to this email directly or view it on GitHubhttps://github.com/addthis/stream-lib/issues/73#issuecomment-43637592 .

medined commented 10 years ago
    ICardinality estimatorA = new HyperLogLogPlus(10, 10);
    for (long i = 0; i < 100; i++) {
        estimatorA.offer(i);
    }
    byte[] backup = estimatorA.getBytes();
    System.out.println("backup size: " + backup.length);

How do I get a new HyperLogLogPlus object from the backup byte array? I need to turn the backup into a List of byte arrays?

abramsm commented 10 years ago

Please look at the test cases for serialization.

https://github.com/addthis/stream-lib/blob/master/src/test/java/com/clearspring/analytics/stream/cardinality/TestHyperLogLogPlus.java