Open Akninirith opened 9 years ago
@Akninirith Thanks for pointing this out! This should throw an IllegalArgumentException
for now, since I haven't reasoned through all the rules for union
-ing mixed parameter HLLs.
If you have an immediate use case involving the union of different log2m
HLLs, I'd be happy to commit a "folding" static method that would produce a copy of the provided HLL at smaller log2m
.
There isn't a use case for this, at the moment at least, but if that changes this thread will be notified ASAP. Regardless, thank you for the timely response!
Is this discussed to be case unioning any kind of HLL with EMPTY/EXPLICIT ? Am I correct that for different log2m & regwidth there will be different subsets of bits and different values for counters (both counter # and value to be set), so, basically this would be no-go to correctly merge one counters to another? Or there are cases when this possible?
@ghost one scenario is hundreds of thousands of sparse HLL_s (or even millions) with a small regwidth_s is stored in disk. Later, merging happens into a single full HLL_m that can represent bigger cardinality values (regwidth_m>=regwidth_s) just for the purpose of computing the cardinality of the union. For this scenario, HLL_m would not be persisted. That might save disk storage and even reduce read cost.
This may fail suprisingly, though. I ran into this problem while implementing the algorithms proposed in this paper.
Even though it's a known issue that won't get solved as it is deemed unimportant / too cornery, I think it should at least get accurately documented.
A union operation done on two HLL objects immediately after they are initialized, with a smaller log2m in the parameter HLL than the one doing the union, throws an ArrayIndexOutOfBoundsException. The below code should be capable of reproducing this error.
HLL shortHLL = new HLL(14, 6, -1, false, HLLType.FULL); HLL longHLL = new HLL(18, 6, -1, false, HLLType.FULL); longHLL.union(shortHLL);