Open abraithwaite opened 9 years ago
it looks like it was written that way from the first version submitted by @jkff. To clarify for the lazy, the difference is between ceil(e/epsilon)
and the current ceil(2/epsilon)
, where e
is the usual 2.7...
. I can imagine them being similar enough in practice, but I couldn't say why it was initially chosen.
Changing the 2
to e
everywhere results in the tests still passing. I imagine that the current implementation would just less accurate than it would be if using Math.E
since it would cause the sketches to be smaller. However it's unlikely that you'd be able to change this without breaking whomever is using it unfortunately.
An intern reported this issue to me last month. Being confused, I quickly re-read the papers and noticed that e
has been used in the initial paper but 2
is used in the latest paper from the same author, see Approximate Date with the Count-Min Data Structure page 4.
According to git log, @jkff wrote the implementation after this second publication. It could explain why. (I don't have time to do the maths but it would be interesting to investigate this change)
The paper referenced in the javadoc uses e
(as per the wayback machine anyway), but in any event, since there is no serialization issue, we could probably change it if there is cause. The latest paper using 2
gives me pause though.
since there is no serialization issue
I think using the same arguments to the constructors would result in different width and depth parameters which wouldn't be ideal, but the epsilon would adjust itself on deserialization at least. In any case I think the safer bet is just leaving it be.
It is curious that there are two versions of this paper without an errata in the later one discussing the changes but at least this mystery is solved.
Thanks for the quick responses!
The width of the sketch according to the paper should be set to
ceil(e/epsilon)
where e is Euler's number. However, I noticed in the current code that this is just set to2.0
.I'm curious as to why this is. Is it good enough in practice for it to not make a difference?