StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.85k stars 1.5k forks source link

Azure Redis - Frequent 'Timeout awaiting response' without obvious cause #2641

Closed Tvde1 closed 5 months ago

Tvde1 commented 5 months ago
Timeout awaiting response (outbound=5408KiB, inbound=8KiB, 9485ms elapsed, timeout is 5000ms),
command=HMGET, next: HMGET QueryCacheKey_ApiResultCacheModel`1_3x2DxFOa8lcDuXbo9x6RDAwcxRkBU3t86rX6NuZJWaM=,
inst: 0, qu: 119, qs: 0, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 65536, last-in: 0, cur-in: 204108, sync-ops: 0, async-ops: 26507, serverEndpoint: [...].redis.cache.windows.net:6380,
conn-sec: 1820.56, aoc: 0, mc: 1/1/0, mgr: 9 of 10 available, clientName: pd0ldwk000FIL(SE.Redis-v2.6.116.40240),
IOCP: (Busy=2,Free=998,Min=60,Max=1000), WORKER: (Busy=22,Free=32745,Min=100,Max=32767), POOL: (Threads=146,QueuedItems=7,CompletedItems=203738), v: 2.6.116.40240
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) 

As you can see, the IOCP and WORKER pools have sufficient minimum threads. I see there are a lot of requests in the outgoing queue. In Azure, for my app service, I don't see any SNAT port exhaustion or any other TCP issues. How can I continue debugging to see what the root causes is? I'm updating to the newest package as we speak.

Tvde1 commented 5 months ago

I do see an error in Azure, but I can't see how to view the information about it.

image
Tvde1 commented 5 months ago

We've detected at 2024-02-06 20:40:00Z that your client is unable to read from the Redis server output buffer quickly enough. More specifically, the number of bytes in the Redis server output buffer for the client reached over 1,000,000 bytes. This can lead to connectivity issues, such as increased latency and time-outs.

This looks to be at the issue's timestamp (from the 'Diagnose & Solve Problems' page)

slorello89 commented 5 months ago

65536 bytes on the inbound socket (that's 2^16th so almost certainly a socket buffer maximum) - sounds like the key under your HMGET was probably pretty huge. It's interesting that there's allegedly 0 messages in qs(@NickCraver maybe what we are seeing here and in #2627 is that the sent message that's failing isn't being included in the error?). This jives with it telling you that you have over a MB of data sitting on the output buffer.

You have the key name so if it isn't expired and hasn't been evicted you can always check it's size:

MEMORY USAGE KEY_NAME

to see just how big it is.

If not, you can use the redis-cli's memkeys feature to see if you can find other such keys that are really big.

redis-cli <your_connection_details> --memkeys

keep in mind this will do a scan over Redis and run memory usage on everything so it's not something you want to run all the time.

Are you experiencing any sort of CPU spikes during these timeouts?

Tvde1 commented 5 months ago

It does seem that huge CPU spikes were a problem. Having Azure show the minute's max CPU instead of the avg showed frequent 100%'ing. Closing this for now untill that's resolved.