StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.87k stars 1.51k forks source link

Timeout Guidance #2550

Open athielen2 opened 12 months ago

athielen2 commented 12 months ago

Our team is running into issues with occasional timeouts using v2.6.96 of the library and would appreciate some guidance. The timeout episodes typically occur once or twice a day, often during periods of low load, and are isolated to a single container (not the same every time) within our service. That container will have hundreds of Redis calls fail over the course of 30 seconds to a few minutes. Occasionally, these timeouts have cascaded and seem to cause widespread timeouts for all network calls from the container.

The containers (1 vCPU each) run within an ECS service that places the containers on 4 vCPU EC2 instances. Our payloads are fairly small (< 1KB), and the server seems to be adequately provisioned. Are we looking at thread starvation here? Do we need to look into pooling ConnectionMultiplexers? Any guidance would be appreciated.

023-09-12 13:14:48,932 61b1a5259c2c [ERROR] [A.Redis.RedisClient] [3] [requested_branch=master] [request_id=0e8f15e2489485b7c24a71447758bbd8] [trace_id=00-0e8f15e2489485b7c24a71447758bbd8-252e7f50bd374521-01] [app_branch=master] Cache exception
Type=StackExchange.Redis.RedisTimeoutException
Message=Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1660ms elapsed, timeout is 1000ms), command=SETEX, next: EXISTS device-auth-token-ZZZZZZZZ inst: 0, qu: 0, qs: 4, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 1025, in-pipe: 0, out-pipe: 0, last-in: 174, cur-in: 0, sync-ops: 0, async-ops: 15120127, serverEndpoint: <redis-server>, conn-sec: 41633.1, mc: 1/1/0, mgr: 10 of 10 available, clientName: 61b1a5259c2c(SE.Redis-v2.6.96.30123), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=32759,Min=4,Max=32767), POOL: (Threads=8,QueuedItems=35,CompletedItems=89363292), v: 2.6.96.30123 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Data:
    Signature=f24be5e2
    Redis-Message=SETEX installation-summary-5d8e54d6fb76960001a27939
    Redis-Message-Next=EXISTS device-auth-token-ZZZZZZZZZ
    Redis-OpsSinceLastHeartbeat=0
    Redis-Queue-Awaiting-Write=0
    Redis-Queue-Awaiting-Response=4
    Redis-Active-Writer=False
    Redis-Backlog-Writer=SpinningDown
    Redis-Read-State=ReadAsync
    Redis-Write-State=Idle
    Redis-Inbound-Bytes=1025
    Redis-Inbound-Pipe-Bytes=0
    Redis-Outbound-Pipe-Bytes=0
    Redis-Last-Result-Bytes=174
    Redis-Inbound-Buffer-Bytes=0
    Redis-Sync-Ops=0
    Redis-Async-Ops=15120127
    Redis-Server-Endpoint=<redis-server>
    Redis-Server-Connected-Seconds=41633.1
    Redis-Multiplexer-Connects=1/1/0
    Redis-Manager=10 of 10 available
    Redis-Client-Name=61b1a5259c2c(SE.Redis-v2.6.96.30123)
    Redis-ThreadPool-IO-Completion=(Busy=0,Free=1000,Min=1,Max=1000)
    Redis-ThreadPool-Workers=(Busy=8,Free=32759,Min=4,Max=32767)
    Redis-ThreadPool-Items=(Threads=8,QueuedItems=35,CompletedItems=89363292)
    Redis-Busy-Workers=8
    Redis-Version=2.6.96.30123
    redis-command=SETEX installation-summary-5d8e54d6fb76960001a27939
    request-sent-status=Sent
    redis-server=<server>
    A.Redis.ConnectionErrorCount=0
    A.Redis.IsConnected=True
Stack Trace:
   at A.AutomaticCapture.Domain.Cache.RedisCache.<>c__DisplayClass18_0.<<SetRedisValue>b__0>d.MoveNext() in /reporoot/src/A.AutomaticCapture/Domain/Cache/RedisCache.cs:line 226
--- End of stack trace from previous location ---
   at A.AutomaticCapture.Domain.Cache.RedisCache.<>c__DisplayClass19_0`1.<<ExecuteRedis>b__0>d.MoveNext() in /reporoot/src/A.AutomaticCapture/Domain/Cache/RedisCache.cs:line 234
--- End of stack trace from previous location ---
   at A.Redis.RedisClient.ExecuteAsync[TResult](Func`3 func)
2023-09-12 13:15:16,069 61b1a5259c2c [ERROR] [A.Redis.RedisClient] [108] [requested_branch=master] [request_id=44e485237ebc67f1f4e0700f6f68393f] [trace_id=00-44e485237ebc67f1f4e0700f6f68393f-5a9064db7f1d1dd9-01] [app_branch=master] Cache exception
Type=StackExchange.Redis.RedisTimeoutException
Message=Timeout awaiting response (outbound=6KiB, inbound=0KiB, 1820ms elapsed, timeout is 1000ms), command=EXISTS, next: SET device-auth-token-ZZZZZZZ, inst: 0, qu: 0, qs: 117, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 477, in-pipe: 0, out-pipe: 0, last-in: 2, cur-in: 0, sync-ops: 0, async-ops: 15127235, serverEndpoint: <server> , conn-sec: 41660.1, mc: 1/1/0, mgr: 10 of 10 available, clientName: 61b1a5259c2c(SE.Redis-v2.6.96.30123), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=68,Free=32699,Min=4,Max=32767), POOL: (Threads=80,QueuedItems=260,CompletedItems=89376065), v: 2.6.96.30123 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Data:
    Signature=46e5e866
    Redis-Message=EXISTS device-auth-token-ZZZZZZZ
    Redis-Message-Next=SET device-auth-token-ZZZZZZZ
    Redis-OpsSinceLastHeartbeat=0
    Redis-Queue-Awaiting-Write=0
    Redis-Queue-Awaiting-Response=117
    Redis-Active-Writer=False
    Redis-Backlog-Writer=SpinningDown
    Redis-Read-State=ReadAsync
    Redis-Write-State=Idle
    Redis-Inbound-Bytes=477
    Redis-Inbound-Pipe-Bytes=0
    Redis-Outbound-Pipe-Bytes=0
    Redis-Last-Result-Bytes=2
    Redis-Inbound-Buffer-Bytes=0
    Redis-Sync-Ops=0
    Redis-Async-Ops=15127235
    Redis-Server-Endpoint=<redis-server>
    Redis-Server-Connected-Seconds=41660.1
    Redis-Multiplexer-Connects=1/1/0
    Redis-Manager=10 of 10 available
    Redis-Client-Name=61b1a5259c2c(SE.Redis-v2.6.96.30123)
    Redis-ThreadPool-IO-Completion=(Busy=0,Free=1000,Min=1,Max=1000)
    Redis-ThreadPool-Workers=(Busy=68,Free=32699,Min=4,Max=32767)
    Redis-ThreadPool-Items=(Threads=80,QueuedItems=260,CompletedItems=89376065)
    Redis-Busy-Workers=68
    Redis-Version=2.6.96.30123
    redis-command=EXISTS device-auth-token-ZZZZZZZ
    request-sent-status=Sent
    redis-server=<redis-server>
    A.Redis.ConnectionErrorCount=177
    A.Redis.IsConnected=True
Stack Trace:
   at A.AutomaticCapture.Domain.Cache.RedisCache.<>c__DisplayClass13_0.<<TryKeyExists>b__0>d.MoveNext() in /reporoot/src/A.AutomaticCapture/Domain/Cache/RedisCache.cs:line 145
--- End of stack trace from previous location ---
   at A.AutomaticCapture.Domain.Cache.RedisCache.<>c__DisplayClass19_0`1.<<ExecuteRedis>b__0>d.MoveNext() in /reporoot/src/A.AutomaticCapture/Domain/Cache/RedisCache.cs:line 234
--- End of stack trace from previous location ---
   at A.Redis.RedisClient.ExecuteAsync[TResult](Func`3 func)
athielen2 commented 11 months ago

Any help here? This has become a pretty significant issue for our service.

NickCraver commented 11 months ago

@athielen2 your first looks like a probable server issue (check with SLOWLOG), the the second is almost certainly thread exhaustion. I'd guess something else in your app is locking, and that's best diagnosed with a memory dump to see what all the threads are stuck on.