Argument checks in HyperLogLogPlus constructor need to be more restrictive

Hello, I was able to reproduce this issue too with the following code snippet in scala

object Main {      
  def main(args: Array[String]) {
    (1 to 5).foreach(_ => run(10 * 1000 * 1000L, 20, 25))
    println()
    (1 to 5).foreach(_ => run(10 * 1000 * 1000L, 20, 32))
  }

  def run(size: Long, p: Int, sp: Int): Unit = {
    val streamlib = new HyperLogLogPlus(p, sp)

    var i: Long = 0L
    while (i < size) {
      val uuid = UUID.randomUUID().toString
      streamlib.offer(uuid)
      i += 1
    }

    printf("p: %s, sp: %s -- exact: %s, estimated: %s, error: %f%%%n",
      p, sp, size, streamlib.cardinality(), 
      100 * Math.abs((streamlib.cardinality() - size) / size.toFloat)
    )
  }
}

... and got the following results

p: 20, sp: 25 -- exact: 10000000, estimated: 9988177, error: 0.118230%
p: 20, sp: 25 -- exact: 10000000, estimated: 10001031, error: 0.010310%
p: 20, sp: 25 -- exact: 10000000, estimated: 9998517, error: 0.014830%
p: 20, sp: 25 -- exact: 10000000, estimated: 9993139, error: 0.068610%
p: 20, sp: 25 -- exact: 10000000, estimated: 10009278, error: 0.092780%

p: 20, sp: 32 -- exact: 10000000, estimated: 9924770, error: 0.752300%
p: 20, sp: 32 -- exact: 10000000, estimated: 9920361, error: 0.796390%
p: 20, sp: 32 -- exact: 10000000, estimated: 9932979, error: 0.670210%
p: 20, sp: 32 -- exact: 10000000, estimated: 9917889, error: 0.821110%
p: 20, sp: 32 -- exact: 10000000, estimated: 9916722, error: 0.832780%

If we try to estimate the error rate boundaries for p=20 we should get 3 * 1.04 / sqrt (2^20) which is 0.003046875 or 0.3046875%, but error rate for p=20,sp=32 is much higher.

Given the described behaviour, should the implementation be patched to allow only the following input values 4 <= p <= sp <= 25 as @oertl suggested?

If so, I can submit a patch to fix it.

addthis / stream-lib

Argument checks in HyperLogLogPlus constructor need to be more restrictive #95