StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.88k stars 1.51k forks source link

RESP3 causes RedisServerException only (P|S)SUBSCRIBE / (P|S)UNSUBSCRIBE / PING / QUIT are allowed in this context #2783

Open shwallmic opened 1 month ago

shwallmic commented 1 month ago

I have an application that is communicating with Azure Redis Enterprise (running version 6.0). I wanted to reduce the number of connections opened to the Redis Server, so I switched to using RESP3 protocol. However, now the application gets into a state that all commands result in error:

"Error while incrementing key TROR5646ed02-5ddf-45f8-92d0-6a2709b83b98|Exception:StackExchange.Redis.RedisServerException: ERR Can't execute 'evalsha': only (P|S)SUBSCRIBE / (P|S)UNSUBSCRIBE / PING / QUIT are allowed in this context\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at StackExchange.Redis.RedisDatabase.d__223.MoveNext()"

My application never calls Subscribe, so it's unclear to me how the client / server gets into this bad state.

1) Is this some sort of bug on the client/server side? 2) Is there a way to get out of this state? 3) With RESP2 protocol, is there a way to prevent the redis client from opening a separate connection for Pub/Sub, given that I never intend to use that functionality?

mgravell commented 3 weeks ago

I'll investigate the failure - no idea what broke there! To disable subscription in RESP2, put ,$SUBSCRIBE= on the end of your config string (which disables that command); however, this will disable auto discovery of server uptime related events.

mgravell commented 3 weeks ago

(update: working with @philon-msft to see if there's an Azure Redis Enterprise thing going on here)

slorello89 commented 3 weeks ago

RESP3 support was introduced to Redis Enterprise in 7.2, so that would be the issue here.

You can upgrade your instance to 7.2 (which is currently in Preview), or revert to using the RESP2 connection, I'd recommend disabling subscribe if you aren't using any subscribe channels and are ok missing those uptime events from the multiplexer. It will cut the number of connections to the proxy in half (and could reduce the number of connections to your shards by more than that depending on your deployment size).

shwallmic commented 3 weeks ago

Thanks for investigating. I can look into implementing these solutions.

However, even with the version mismatch, does it make sense that the server could get into a state where it only accepts PSUBSCRIBE etc if I never call subscribe?

Also, if the server is not on a version that accepts RESP3, shouldn't the client be falling back to RESP2?

mgravell commented 3 weeks ago

@slorello89 that still sounds like a library bug, though; we're meant to be detecting the outcome of the HELLO attempt. I need to investigate this and make it not break so spectacularly.

slorello89 commented 3 weeks ago

@mgravell aye, this is odd, the HELLO should fail and (in my testing) it looks like it opens up a RESP 2 connection, (at least it sends the AUTH command after the HELLO fails), interestingly though it does not seem like the client attempts to subscribe to the normal channels, nor does it appear to open a subscriber connection. Also I'm not able to reproduce @shwallmic's issue, any time I try to subscribe after trying to connect with RESP 3 the subscription fails because the Subscriber connection was never opened.

@shwallmic - are you doing something odd like trying to subscribe over the ad-hoc interface? e.g.

db.Execute("SUBSCRIBE", "test"); // subscribing over the interactive connection is connection altering
db.StringSet("foo", "bar"); // now the connection is unusable so we get your error.

that's the only way I can reproduce this error.

mgravell commented 3 weeks ago

(at least it sends the AUTH command after the HELLO fails)

(we actually send AUTH before the HELLO has failed: https://github.com/StackExchange/StackExchange.Redis/blob/main/src/StackExchange.Redis/ServerEndPoint.cs#L943-L964)

slorello89 commented 3 weeks ago

Got it, regardless, I can see that the connection is RESP2:

id=11615003000 addr=x.x.x.x:58624 laddr=10.0.0.6:10000 fd=138 name=machine-name(SE.Redis-v2.8.12.45748) age=4 idle=3 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 obl=0 events=r cmd=get user=default resp=2 lib-name=SE.Redis lib-ver=2.8.12.45748
shwallmic commented 3 weeks ago

We don't call SUBSCRIBE in our code. I've connected to the server and run INFO commandstats. it shows that SUBSCRIBE has never been called.

INFO commandstats

Commandstats

cmdstat_psync:calls=9,usec=5088,usec_per_call=565.33 cmdstat_expire:calls=1894160611,usec=33737120714,usec_per_call=17.81 cmdstat_replconf:calls=1497934,usec=2028041,usec_per_call=1.35 cmdstat_evalsha:calls=1194244389,usec=59167244130,usec_per_call=49.54 cmdstat_ping:calls=953430,usec=232276,usec_per_call=0.24 cmdstat_hello:calls=1618093,usec=3527337,usec_per_call=2.18 cmdstat_client:calls=24499224,usec=98611057,usec_per_call=4.03 cmdstat_hdel:calls=87410294,usec=910576785,usec_per_call=10.42 cmdstat_time:calls=1196364535,usec=8783809366,usec_per_call=7.34 cmdstat_del:calls=1578607,usec=5243121,usec_per_call=3.32 cmdstat_echo:calls=4005252,usec=2313563,usec_per_call=0.58 cmdstat_eval:calls=2120146,usec=150306675,usec_per_call=70.89 cmdstat_exec:calls=1325069336,usec=8681112978,usec_per_call=6.55 cmdstat_hkeys:calls=1196364535,usec=21961135355,usec_per_call=18.36 cmdstat_hincrby:calls=1894160611,usec=30776527192,usec_per_call=16.25 cmdstat_hvals:calls=581450130,usec=20381002643,usec_per_call=35.05 cmdstat_select:calls=9,usec=6,usec_per_call=0.67 cmdstat_multi:calls=1325069336,usec=253850681,usec_per_call=0.19 cmdstat_get:calls=3170980,usec=5975640,usec_per_call=1.88

Maybe something else is going on. Could something happen to the connection (too many timeouts for example?) that could cause it to get into a state that requires the connection to QUIT / reconnect?

mgravell commented 3 weeks ago

I will need to investigate. However, for now the workaround is simple: don't try to enable RESP3 on this setup, since the server doesn't support it.

shwallmic commented 3 days ago

@mgravell One more question, can you clarify what you mean when you say "server uptime related events" I dont see documentation that matches that, so Im not clear exactly what functionality will be lost when auto discovery of server uptime related events is disable/ what risk I am running.

mgravell commented 3 days ago

Some hosted services such as azure redis broadcast service data on known channels, that the library can listen too. There are also library-specific maintenance channels, which will be triggered if using the library for things like switching primary nodes.