Open LastTribunal opened 1 year ago
Can you please try the latest version and see if you can reproduce? There have been a few related fixes here in later versions that hopefully cover your cause as well - we handle some issues a bit better for these scenarios.
Thank you sir. We are doing exactly that, will post updates
@LastTribunal any luck with version changes?
So far we havent seen any issues, but also there has not been any Azure Redis outages, which has been the trigger for previous faults. Will keep you posted
Can you please try the latest version and see if you can reproduce? There have been a few related fixes here in later versions that hopefully cover your cause as well - we handle some issues a bit better for these scenarios.
After a couple of months we finally saw an issue. however, unlike in the past, the new redis client version we installed, self-healed without us having to restart the VM. So it seems the issue still exists, but is being managed internally.
we saw two nodes out of five,, experience this for 10 minutes each within a few hours..
@LastTribunal Did you have any exception from that? Could this be the ~6 weekly patching that happens across Azure nodes?
Hello @NickCraver , I am a colleague of @LastTribunal and this issue still persists. We are experiencing this on a bi daily bases with 10-20 min outage in our production environment. This issue causes massive disconnects and load on our servers and all clients are trying to get back. We engaged with Azure Network and Redis team and they are not able to find any issue on their end. They requested us to tweak the IOCP min thread, but we feel like thats a rabbit hole we dont want to go in. We also tried this on a console and tweaking does not help.
Any assistance will be helpful and we are running out of ideas.
Thanks,
@IfOnTuesdayAndNotRainingThenDoThis Do you have some more recent error samples to look at?
@NickCraver
Here is the exception:
StackExchange.Redis.RedisTimeoutException at StackExchange.Redis.RedisDatabase+
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5437ms elapsed, timeout is 5000ms), command=EVAL, next: EVAL, inst: 0, qu: 0, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 5256, last-in: 0, cur-in: 0, sync-ops: 4, async-ops: 481539, serverEndpoint: er-prod-redis.redis.cache.windows.net:6380, conn-sec: 384449.3, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: NodesPrd2000003(SE.Redis-v2.6.104.40210), IOCP: (Busy=72,Free=928,Min=8,Max=1000), WORKER: (Busy=5,Free=32762,Min=8,Max=32767), v: 2.6.104.40210 (Please take a
look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Its a surge for 100+ exceptions and happens to all our vm nodes at the same time.
Here are the version we are using
@NickCraver
Update:
we were able to capture the exception we got on the clients that were subscribed:
SignalR connection error: "System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.
at System.Net.WebSockets.WebSocketBase.WebSocketOperation.
We also saw signalR connect spike that caused the redis timeout, right after wards. We cant seem to find the cause of these disconnects.
My org is using StackExchange.Redis .net library (v: 2.2.62.27853) for Signalr Backplane with Azure Redis Service in a multi-node VM cluster.
We have observed that whenever there is an outage in Azure Redis Service, any (not any particular) one or more of our cluster nodes redis client becomes corrupt and logs the error below.
The error persists after Azure Redis Service is recovered. The error persists continuously, and even under a minimal server load The error is resolved only after rebooting the cluster node.
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5563ms elapsed, timeout is 5000ms), command=EVAL, next: EVAL, inst: 0, qu: 0, qs: 4, aw: False, rs: ReadAsync, ws: Idle, in: 1552, serverEndpoint: redisprod.redis.cache.windows.net:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: Nodes0004, IOCP: (Busy=29,Free=971,Min=8,Max=1000), WORKER: (Busy=2,Free=32765,Min=8,Max=32767), v: 2.2.62.27853 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)