Closed RafalBielickiIM closed 7 months ago
Hi @RafalBielickiIM Indeed, Dragonfly never uses the sparse representation, it always uses the dense one. It's not a bug, but more of a missing feature we currently did not prioritize.
Looking into code there is a server flag in redis hll_sparse_max_bytes that can modify when to use sparse mode. I cannot seem to find usage of it in dragonfly.
Hi @RafalBielickiIM Indeed, Dragonfly never uses the sparse representation, it always uses the dense one. It's not a bug, but more of a missing feature we currently did not prioritize.
Is there any chance we get it in the near future?
Looking at the code seems like there is some implementation for it. https://github.com/dragonflydb/dragonfly/blob/7c443f3a15d0f267753e0d80dc1e45c226bc8bea/src/redis/hyperloglog.c#L371
There is even a test for that https://github.com/dragonflydb/dragonfly/blob/7c443f3a15d0f267753e0d80dc1e45c226bc8bea/src/server/rdb_test.cc#L441
@chakaz any ideas/notes to the above? For now we handle low cardinality with hash + SET.
It may take some time I'm afraid. I added the 'help wanted' label, maybe an external contributor can assist here as well.
@azuredream hey, in case you are still looking for an interesting task to work on :)
@azuredream hey, in case you are still looking for an interesting task to work on :)
Glad to help. Looking into this.
Research completed. Coding...
Encountered a problem. SDS in Redis allows pre-allocating space for string. If that's not the case in Dragonfly DB, we probably end up with much more (almost for every insertion) reallocations.
/* sdsalloc() = sdsavail() + sdslen() */
When creating a new sparse hll obj, Redis allocates sparselen
bytes, and set value for the first HLL_HDR_SIZE+2 bytes.
int sparselen = HLL_HDR_SIZE + (((HLL_REGISTERS+(HLL_SPARSE_XZERO_MAX_LEN-1)) /
HLL_SPARSE_XZERO_MAX_LEN)*2);
When inserting more keys to a sparse hll, Redis enlarge it if required:
if (sdsalloc(o->ptr) < server.hll_sparse_max_bytes && sdsavail(o->ptr) < 3) {
size_t newlen = sdslen(o->ptr) + 3;
newlen += min(newlen, 300); /* Greediness: double 'newlen' if it is smaller than 300, or add 300 to it when it exceeds 300 */
if (newlen > server.hll_sparse_max_bytes)
newlen = server.hll_sparse_max_bytes;
o->ptr = sdsResize(o->ptr, newlen, 1);
}
In Dragonfly DB, is there a db object with .allocated_size and .used_size ? If not, we probably end up with much more (almost for every insertion) reallocation.
An alternative is always allocating the max size of sparse hll (Or progressively. 10%,50%,100%). It's a configurable value(min: 0, max: LONG_MAX, default3000) in Redis:
server.hll_sparse_max_bytes
yes, CompactObj::MallocUsed()
gives you the allocated_size
and CompactObject::Size()
provides you with the actual string length (less or equal than MallocUsed()),
so SdsAvail equivalent would be MallocUsed() - Size()
so SdsAvail equivalent would be
MallocUsed() - Size()
There are cases where MallocUsed()
returns 0 (if the string is inline), so we should also handle that case as well.
An alternative is always allocating the max size of sparse hll
I'd try to avoid that if possible. Now we allocate 15k per entry, and it looks like Redis allocates a minimum of 200 bytes.
yes,
CompactObj::MallocUsed()
gives you theallocated_size
andCompactObject::Size()
provides you with the actual string length (less or equal than MallocUsed()),so SdsAvail equivalent would be
MallocUsed() - Size()
Thanks. It seems that CompactObj is a robj wrapper and can be stored to db. I'll try that. I probably need to refactor the logic for dense hll to keep consistency.
so SdsAvail equivalent would be
MallocUsed() - Size()
There are cases where
MallocUsed()
returns 0 (if the string is inline), so we should also handle that case as well.An alternative is always allocating the max size of sparse hll
I'd try to avoid that if possible. Now we allocate 15k per entry, and it looks like Redis allocates a minimum of 200 bytes.
Please correct me if I was wrong. A string shorter than or equals to 16 bytes will be an inline string.
static constexpr unsigned kInlineLen = 16;
In this case, we don't need to worry about that because hll header takes 16 bytes. registers[] takes at least 2 bytes.
struct hllhdr {
char magic[4]; /* "HYLL" */
uint8_t encoding; /* HLL_DENSE or HLL_SPARSE. */
uint8_t notused[3]; /* Reserved for future use, must be zero. */
uint8_t card[8]; /* Cached cardinality, little endian. */
uint8_t registers[]; /* Data bytes. */
};
18 is theoretical minimum. In Redis, a new sparse hll string is 20 bytes by default.
sparselen
=16+4 = 20
int sparselen = HLL_HDR_SIZE + (((HLL_REGISTERS+(HLL_SPARSE_XZERO_MAX_LEN-1)) /
HLL_SPARSE_XZERO_MAX_LEN)*2);
If, as mentioned above, Redis allocates at least 200 bytes, that's even better.
Thanks. It seems that CompactObj is a robj wrapper and can be stored to db. I'll try that. I probably need to refactor the logic for dense hll to keep consistency.
I am not sure exactly what you're trying to do, but note that we can't use our robj
directly as sds string with the Redis API (see here).
If, as mentioned above, Redis allocates at least 200 bytes, that's even better.
Sorry, I don't know what's the real minimum Redis allocates. I only deducted that from the first comment of this bug. If you see that the minimum is 20, then that must be it. Sorry about confusing you.
If you think that indeed is the case, please add a DCHECK()
to make sure our assumption holds.
Thanks!
Sorry, I don't know what's the real minimum Redis allocates. I only deducted that from the first comment of this bug. If you see that the minimum is 20, then that must be it. Sorry about confusing you.
If you think that indeed is the case, please add a
DCHECK()
to make sure our assumption holds.
20 is just a theoretical value. Will do some experiments to see the behavior. I just wanted to confirm that <=16 bytes will be a inline string.
DCHECK
is good idea. Thanks for your advice.
Could anyone share more about how to use the memory usage test script above? I ran it on Redis server, returned 0B. On Dragonfly DB server, maxmemory_human was 10GB before I insert anything.
maxmemory_human
is the memory limit, not memory usage. Dragonfly determines the limit automatically based on the available memory on its host
Describe the bug According to Redis docs HLL is supposed to use UP TO 12kb while dragonfly always uses 12kb. When running simple test Redis uses 20mb to store 100 000 HLL keys. Dragonfly on the other hand uses 1.5gb to store 100 000 HLL keys. Seems like the sparse mode is never used in DF.
To Reproduce Steps to reproduce the behaviuor:
Example python script to insert keys
Expected behavior HLL should not take 12kb for small sets.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Reproducible Code Snippet
result for Redis 6.2.6
results for Dragonfly 1.14.6