StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.84k stars 1.5k forks source link

Timeouts under load, difficulties in finding the bottle neck. #2687

Open TannerBrunscheon opened 3 months ago

TannerBrunscheon commented 3 months ago

Hi Folks,

We have been running into issues using StackExchange.Redis around timeouts under high loads. We have updated to the latest version included in Microsoft.Extensions.Caching.StackExchangeRedis. We are hosting this code in EKS and Redis within the same Kubernetes cluster. These happen when the system is under saturated load.

I have upped the amount of threads that are available to 200 worker threads and 25 IOCP, though I never see the IOCP threads being used. I notice that often there is a large qs number and a large in value and rs: DequeueResult is quite high.

We have tried using the .NET socket manager, the StackExchange manger, upping the thread counts, throwing more memory at the client, throwing more cpu at the client, spinning up more clients and nothing seem to be helping. Any ideas?

Timeout awaiting response (outbound=11KiB, inbound=19344KiB, 5284ms elapsed, timeout is 5000ms), command=HMGET, next: HMGET entity_tree, inst: 1, qu: 36, qs: 1147, aw: True, bw: WritingMessage, rs: DequeueResult, ws: Writing, in: 5046111, in-pipe: 1998848, out-pipe: 0, last-in: 0, cur-in: 1828209, sync-ops: 0, async-ops: 1432, serverEndpoint: redis-0.redis.redis.svc.cluster.local:6379, conn-sec: 66.12, aoc: 0, mc: 1/1/0, clientName: routine-service-59dfdd4578-nqtzc(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=25,Max=1000), WORKER: (Busy=135,Free=32632,Min=200,Max=32767), POOL: (Threads=135,QueuedItems=580,CompletedItems=4804,Timers=16), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

NickCraver commented 3 months ago

It looks like the problem in this case is likely a very large payload (I see ~19MB on the inbound pipe) - how big are the individual payloads here?

TannerBrunscheon commented 3 months ago

The individual payload is around 500kb.

MEMORY USAGE entity_tree image

NickCraver commented 1 month ago

@TannerBrunscheon How many of these are being sent per interval? e.g. how many per second? And what bandwidth do you have to/from Redis?