Grokzen / redis-py-cluster

Python cluster client for the official redis cluster. Redis 3.0+.
https://redis-py-cluster.readthedocs.io/
MIT License
1.1k stars 316 forks source link

Cannot create a read from replicas pipeline. #400

Open duke-cliff opened 3 years ago

duke-cliff commented 3 years ago

https://github.com/Grokzen/redis-py-cluster/blob/master/rediscluster/pipeline.py#L197

This should pick up the: self.read_from_replicas

Also, when we create the pipeline, we should be able to set read_from_replicas it from: https://github.com/Grokzen/redis-py-cluster/blob/master/rediscluster/client.py#L445-L464

Why we need it? We often need to read multiple keys(randomly from different nodes), the redis CPU load will be significantly different(60%) when using pipeline vs mget(a get loop in cluster). But due to the above bug, we could not use pipeline to read from replicas.

Grokzen commented 3 years ago

Good catch @duke-cliff Yes i am aware that pipelines do not support read from replicas today and that is something i want to sort out and fix for 3.0.0 later on. There is even two different readonly/read_from_replica connection pools that are buggy themself that needs to be sorted out. Pipelines in general is often a lower priority to sort out then the main code path through the client. Also note that the logic handling would grow in complexity a fair bit to properly route commands in several new situations.

Just as an example, if we have 1 master and two replicas per master node. If we want to have readonly mode enabled and the slave node is down, should we route to the master node or the other slave node for reads? Also based on what command you are running you would get different logics like SET style commands must be routed to a master where GET like command that can be offloaded to a pipeline can be routed differently, and you need to track this with MOVED and other cluster specific errors where commands needs to be rerouted for some reason.

duke-cliff commented 3 years ago

Yes, I understand the complexity.

To me, the readonly does not mean replicas only, it should route read to all master/replicas. You can find a similar logic in go-redis which supports the cluster mode as well: https://github.com/go-redis/redis/blob/b965d69fc9defa439a46d8178b60fc1d44f8fe29/cluster.go#L584

In terms of how the pipeline should decide which node to use. If you want to do something simple, I think we can imply if the pipeline contains any write cmds, it will go to masters only. This can even be set by caller as I suggested above to give a parameter in cluster.pipeline(readonly=True)

If you want something smarter you can also read the redis-go logic, it completely routes cmds based on if it's read-only cmd or not. https://github.com/go-redis/redis/blob/eb28cb64264d03218e5aa8f35eb66a4e4c4ba60b/cluster.go#L1124

kmorrison commented 3 years ago

Hi @Grokzen, I work with @duke-cliff . We're wondering if you would accept a patch in the form of

diff --git a/rediscluster/client.py b/rediscluster/client.py
index cb8f59d..413a797 100644
--- a/rediscluster/client.py
+++ b/rediscluster/client.py
@@ -442,7 +442,7 @@ class RedisCluster(Redis):
         """
         return ClusterPubSub(self.connection_pool, **kwargs)

-    def pipeline(self, transaction=None, shard_hint=None):
+    def pipeline(self, transaction=None, shard_hint=None, read_from_replicas=False):
         """
         Cluster impl:
             Pipelines do not work in cluster mode the same way they do in normal mode.
@@ -461,6 +461,7 @@ class RedisCluster(Redis):
             result_callbacks=self.result_callbacks,
             response_callbacks=self.response_callbacks,
             cluster_down_retry_attempts=self.cluster_down_retry_attempts,
+            read_from_replicas=read_from_replicas,
         )

     def transaction(self, *args, **kwargs):
diff --git a/rediscluster/pipeline.py b/rediscluster/pipeline.py
index 1b7065f..4d2272e 100644
--- a/rediscluster/pipeline.py
+++ b/rediscluster/pipeline.py
@@ -194,7 +194,7 @@ class ClusterPipeline(RedisCluster):
             # refer to our internal node -> slot table that tells us where a given
             # command should route to.
             slot = self._determine_slot(*c.args)
-            node = self.connection_pool.get_node_by_slot(slot)
+            node = self.connection_pool.get_node_by_slot(slot, self.read_from_replicas)

             # little hack to make sure the node name is populated. probably could clean this up.
             self.connection_pool.nodes.set_node_name(node)

or would you be willing to collaborate on a PR that started with that as the basis? I'm a little unclear on the politeness of me just opening a PR against your repo but perhaps that was the more straightforward approach; thought I'd ask first.

FranGM commented 3 years ago

Noticed that https://github.com/Grokzen/redis-py-cluster/pull/402 was created for this. Is there any way we could help make this happen? We'd love to have the ability to send read commands to replicas within pipelines.

Grokzen commented 3 years ago

@FranGM Check out #406 it was the second attempt by duke to implement his but i am not super happy with the approach to go back to gevent for doing the parallel parts. We used to have gevent back in the day before 72squared figured out a much cleaner and simpler approach to it that do not really require gevent.

That PR right now has that issue, it is also doing some major rework in the pipeline code and it is almost to much to change in one PR and i am not really sure about all the changes as i did not develop the current version of the pipeline code so i am not really into the current status of the code.

So if you want to help with something, either look at that PR and possibly scale the changes back if that is even possible and maybe base a new set of work that does more minimal intervention to add this feature and that will highly increase the chances of this being added more quickly.

One more argument why i want to keep the changes to a minimal is what i wrote in my last comment above here that i am going to take a completley different approach to how to handle the cluster in 3.0.0 so i do not really want to tear out or rewrite everything now to just rip it out soon in 3.0.0 for a total new implementation anyway.

FranGM commented 3 years ago

Thanks, that's a very clear explanation. I'll try to provide a simpler patch and then we can go from there!

FranGM commented 3 years ago

@Grokzen I went ahead and created https://github.com/Grokzen/redis-py-cluster/pull/450 with the simplest implementation I could come up with that still seems to work as advertised. Let me know if that's something along the lines of what you would be happy with!

Side-question: do you have any rough timeline for 3.0.0? We'd definitely like to see if we can help with more optimizations but don't really want to pester you about code you're heavily rewriting.

Grokzen commented 3 years ago

@duke-cliff Note that even with this PR #450 merged now, there is still work to be done as the other suggested PR submitted has many more changes and fixes in it that can still be useful. The merged PR just fixes it in the short term and users should be aware that it can break and cause inconsistencies during hashslot migrations and failover scenarios, but that is right now something the user has to consider and ensure their code works with this.

I will keep this issue open until we sort out your PR @duke-cliff but for anyone else who finds this issues and sees it being open, part of the problem has been fixed and merged and shipped in 2.1.3 release, but be aware to use the stopgap solution with care and consideration that it can still be buggy.

FranGM commented 3 years ago

@Grokzen wondering when you had planned to release 2.1.3? We have a release for our service coming out which I'd love to include 2.1.3 instead of 2.1.2 if at all possible

Grokzen commented 3 years ago

@FranGM Oh lol, i prepared the release with changelog and all and thought i pushed the build files out to pypi -_- but aparently i forgot about that last step...i will do it right now sorry for that :)