basho / riak_kv

Riak Key/Value Store
Apache License 2.0
653 stars 233 forks source link

Interest in replacing Hyper #1888

Open DianaOlympos opened 8 months ago

DianaOlympos commented 8 months ago

Hello,

A few years ago, wanting to move Hyper to a more modern OTP (in particular, its test suite that depended on the old random module), I forked the original GameAnalytics repo.

I moved it to rebar3 (dropping a lot of the C backend in the process), then to rand, and discovered in this process that the implementation, particularly of the reduce precision used for merge of different precision sketches, was... wrong.

It could not pass its own tests.

I am pretty sure that the fork Riak is using has the same bug.

I could probably fix the bug, but to find out how to fix it, I ended up implementing a more recent way to evaluate the sketches and working, intermittently, on bringing it up to date with the past decade of research on HLL making them faster, more precise and smaller in memory and on the wire. see https://hex.pm/packages/hyper

Like I am going to work on implementing the new UltraLogLog backend that just got found https://arxiv.org/abs/2308.16862

Would there be an interest in Riak to move to using this instead of your fork? I think I can find a way to make it backward compatible with current users, so that they do not have problems and migration to do (we could probably even migrate silently), if need be.

I am asking because having an actual user would probably help me a bit get motivated to work on this, and afaict the Riak userbase is the largest one using hll in erlang these days.

If the answer is no, I get it too :)

martinsumner commented 8 months ago

The branch of hyper we use in Riak is this - https://github.com/basho/hyper/tree/develop. The only work we've done on it recently, have been dirty fixes just to get it to compile.

One issue is that I don't know any Riak customers that actually use it. Perhaps someone will shout-out on here - I think it was written for a specific customer just before basho went under. So there is perhaps a user out there, but none of the known big Riak users I'm aware of have a need for it.

Presently we're talking of slimming down Riak in the future, and removing under-used features. So we're as likely to take hyper out as we are to be concerned about a major update. That would be a shame, as it is an interesting feature, but without someone needing it, it is is just an ongoing overhead.

@Bob-The-Marauder - do you know of any active users of this?

DianaOlympos commented 8 months ago

Taking it out works for me too, as it is probably still bugged. I will check tomorrow if it is the buggy version, but I doubt anyone found it.

I am all for killing unused stuff :D

Bob-The-Marauder commented 7 months ago

Sorry for the delayed reply.

I am aware of at least two HyperLogLog users in the wild. One was a former support customer who left us when we dropped native Japanese support. The other is a customer we did a standalone healthcheck for and not a support contract.