DiceDB / dice

DiceDB is a redis-compliant, in-memory, real-time, and reactive database optimized for modern hardware and for building and scaling truly real-time applications.
https://dicedb.io/
Other
6.69k stars 1.06k forks source link

Analyse and implement Dense representation for `PFADD` #446

Open lucifercr07 opened 2 months ago

lucifercr07 commented 2 months ago
chettriyuvraj commented 2 months ago

I have no clue so might require a bit of to-fro in terms of questions - but I'd love to take this up!

lucifercr07 commented 2 months ago

Assigned, @chettriyuvraj thanks for picking this up, please let me know if you've any queries around this.

evoxtorm commented 1 month ago

Hey @lucifercr07

So, I was looking into this issue so that I can also help @chettriyuvraj a bit, while reading the code I stumbled upon some things.

switch(hdr->encoding) {
    case HLL_DENSE: return hllDenseAdd(hdr->registers,ele,elesize);
    case HLL_SPARSE: return hllSparseAdd(o,ele,elesize);

So for the above code in Redis file the sparse and dense implementation can be chosen in the hllAdd function based on the encoding type and the other is inside the hllSparseAdd(o,ele,elesize); which is based on the threshold value, So here are we implementing both of these methods or we are going with the threshold based on the size.

One other thing is the way it calculates the string size, so in the Redis, they have sdslen() so do we also have to do something like that to get the size of strings in args?

Thanks

chettriyuvraj commented 1 month ago

Sorry for no updates on this one - was down with H1N1 the past week. Will pick up and put out an update @lucifercr07 .

arpitbbhayani commented 1 month ago

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

chettriyuvraj commented 1 month ago

Hi @arpitbbhayani!

Today was the first day I actively picked up the issue. I'll be posting an update daily from now on.

Status

I mentioned that I had no clue about what HyperLogLog was and my first step today was to pick up the paper.

I'll hopefully have a bit more concrete updates + queries to ask tomorrow.

Apologies about the unreasonable delays and for slowing things down on this issue - I know how important keeping the momentum in this project is. Please bear with me - I'll get this over the line!

chettriyuvraj commented 1 month ago

Hey @lucifercr07!

A little unsure if my understanding is correct here

My understanding

In Redis source, this seems to be handling the promotion from sparse to dense representations.

Promotion occurs in 2 cases: Case 1. Using #define HLL_SPARSE_VAL_MAX_VALUE 32, where the value exceeds that which can be represented by a sparse register. Case 2. A configurable threshold of bytes i.e. _server.hll_sparse_maxbytes. If the storage consumed by sparse representation exceeds this - promotion occurs.

Queries

arpitbbhayani commented 1 month ago

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

arpitbbhayani commented 4 weeks ago

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

arpitbbhayani commented 1 week ago

Hello @chettriyuvraj,

There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.

We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.

Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.

Thanks again.

chettriyuvraj commented 4 days ago

Hi @arpitbbhayani - have unassigned myself from this issue. Free to assign it to someone else.