Grokzen / redis-py-cluster

Python cluster client for the official redis cluster. Redis 3.0+.
https://redis-py-cluster.readthedocs.io/
MIT License
1.1k stars 316 forks source link

Reading from replicas in pipeline is currently broken #470

Open FranGM opened 3 years ago

FranGM commented 3 years ago

Hi there!

After testing https://github.com/Grokzen/redis-py-cluster/commit/2a4c77dfccd5cbcf834ed1c514e5c3c9b2cd3f25 a bit more extensively we noticed that reading from replicas within a pipeline is not exactly working as intended. The behaviour we noticed was that latency for our pipelined commands was actually worse than what we'd get if we ignored the pipeline and just sent commands serially.

Upon closer inspection we noticed that the redis server will always return a MOVED response even if we could guarantee we're reaching the right node for that key:

03:08:48.095422 IP redis.6379 > app.28738: Flags [S.], seq 1871059213, ack 1758231483, win 26883, options [mss 8961,nop,nop,sackOK,nop,wscale 7], length 0                                                        03:08:48.095441 IP app.28738 > redis.6379: Flags [.], ack 1, win 491, length 0
03:08:48.095463 IP app.28738 > redis.6379: Flags [P.], seq 1:64, ack 1, win 491, length 63: RESP "HGETALL" "my_query"
03:08:48.096251 IP redis.6379 > app.28738: Flags [.], ack 64, win 211, length 0
03:08:48.096413 IP redis.6379 > app.28738: Flags [P.], seq 1:32, ack 64, win 211, length 31: RESP "MOVED 10819 10.8.74.196:6379"
03:08:48.096416 IP app.28738 > redis.6379: Flags [.], ack 32, win 491, length 0

This is because we were not sending the READONLY command upon connection, and results on pretty much every read command in the pipeline failing, and needing to be retried serially afterwards (which obviously incurs a massive latency hit).

We've fixed this internally and had the fix (https://github.com/Grokzen/redis-py-cluster/pull/471) running in production for a few weeks now, you can see latency before and after the fix: image