Open mayankasthana opened 4 years ago
Hi @gkorland @itamarhaber Could you please share your opinion on this.
@mayankasthana notice MGET
is only relevant to String values and not for Hash.
HMGET
on the other hand is a way to retrieve multi values of a single Key similar to HGETALL with a way to limit the fields retrieved from the Hash
@mayankasthana sorry you right on RDD MGET
will probably get better latency in some scenarios, a mix of both MGET and pipeline should probably work the best (to avoid to large MGETs)
Maybe we can reuse the spark.redis.max.pipeline.size property for the mget
size, since in this context they are equivalent.
So I just found out that all keys asked in one MGET
should be from the same slot and it is not enough that the keys be from the same node in a cluster setup.
I prototyped some code here to first sort all keys by slot ids, and then take spark.redis.max.pipeline.size
items and create multiple mget
calls in multiple pipelines. It works, but looks unwieldy and doesn't look very efficient at least from spark side. Can't say from redis side. If there is not much benefit from redis side, this might not be a good idea at all.
It is recommended to use
mget
to do multipleget
s instead of pipelining lots ofget
s together. A few benchmarks also show performance improvements.I don't see any drawback to doing it, and it might be a simple enough change. Is there anything I am missing that makes using
mget
nonviable?Maybe we can do it for
hmget
to read hashes as well.