Open lucifercr07 opened 2 months ago
I have no clue so might require a bit of to-fro in terms of questions - but I'd love to take this up!
Assigned, @chettriyuvraj thanks for picking this up, please let me know if you've any queries around this.
Hey @lucifercr07
So, I was looking into this issue so that I can also help @chettriyuvraj a bit, while reading the code I stumbled upon some things.
switch(hdr->encoding) {
case HLL_DENSE: return hllDenseAdd(hdr->registers,ele,elesize);
case HLL_SPARSE: return hllSparseAdd(o,ele,elesize);
So for the above code in Redis file the sparse and dense implementation can be chosen in the hllAdd
function based on the encoding type and the other is inside the hllSparseAdd(o,ele,elesize);
which is based on the threshold value, So here are we implementing both of these methods or we are going with the threshold based on the size.
One other thing is the way it calculates the string size, so in the Redis, they have sdslen()
so do we also have to do something like that to get the size of strings in args?
Thanks
Sorry for no updates on this one - was down with H1N1 the past week. Will pick up and put out an update @lucifercr07 .
Hello @chettriyuvraj,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
Hi @arpitbbhayani!
Today was the first day I actively picked up the issue. I'll be posting an update daily from now on.
I mentioned that I had no clue about what HyperLogLog was and my first step today was to pick up the paper.
I'll hopefully have a bit more concrete updates + queries to ask tomorrow.
Apologies about the unreasonable delays and for slowing things down on this issue - I know how important keeping the momentum in this project is. Please bear with me - I'll get this over the line!
Hey @lucifercr07!
A little unsure if my understanding is correct here
In Redis source, this seems to be handling the promotion from sparse to dense representations.
Promotion occurs in 2 cases: Case 1. Using #define HLL_SPARSE_VAL_MAX_VALUE 32, where the value exceeds that which can be represented by a sparse register. Case 2. A configurable threshold of bytes i.e. _server.hll_sparse_maxbytes. If the storage consumed by sparse representation exceeds this - promotion occurs.
Promotion in case 1 -> shouldn't this case already be handled by the library we are using? I had a peek (just a glimpse) on the library's source and it is promoting sparse to dense inside its Insert(..) function, need to take a closer look
So the main objective of the issue is to determine the threshold for case 2?
Analyse feasibility of switch between Sparse and Dense representations based on benchmark testing - can you elaborate on how benchmarking will help us decide this?
Hello @chettriyuvraj,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
Hello @chettriyuvraj,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
Hello @chettriyuvraj,
There has been no activity on this issue for the past 5 days. It would be awesome if you keep posting updates to this issue so that we know you are actively working on it.
We are really eager to close this issue at the earliest, hence if we continue to see the inactivity, we will have to reassign the issue to someone else. We are doing this to ensure that the project maintains its momentum and others are not blocked on this work.
Just drop a comment with the current status of the work or share any issues you are facing. We can always chip in to help you out.
Thanks again.
Hi @arpitbbhayani - have unassigned myself from this issue. Free to assign it to someone else.
PFADD
.Dense
representation for HLL once a threshold is reached i.e#define HLL_SPARSE_VAL_MAX_VALUE 32
for Redis we can evaluate based on benchmark numbers when to promote.