StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.86k stars 1.5k forks source link

Timeout issue when during medium thread load on .NET project #2586

Open Bencly opened 9 months ago

Bencly commented 9 months ago

Hi all,

I'm encountering some issue when reading from Redis while we are performing load test (medium load) But we are getting this message sometimes: Message: Timeout performing MGET (2000ms), next: MGET, inst: 1, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 131072, serverEndpoint: xxx, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxx, IOCP: (Busy=20,Free=980,Min=256,Max=1000), WORKER: (Busy=1,Free=32766,Min=256,Max=32767)

This one looks werid, nothing is queuing up and all the manager is available, number of busy thread is way lower than Min. We also checked the load on Redis server, everyting looks okay, CPU and memory usage is not even high at this point.

Could we get some insight about this issue please?

Thank you!

NickCraver commented 8 months ago

We see the command here is an MGET which can add up - how large is the payload we're trying to get here?

This also looks like a very old version given the message - we always recommend updating to latest as we continually make improvements to both performance and logging to help figure things out.

Bencly commented 8 months ago

Hi @NickCraver

thanks for responding! Yes we are using the MGET to search 220+ keys at a time, each key should be 20-40 long, and the return value should contain 6000+ character for each key (so total 220*6000) This is during a medium load situation when we have 100 threads running and try to hit Redis

we are using 2.6.7 and can not change it for some reasons

could we get some more insights on how to debug this issue?

Thank you!

Bencly commented 8 months ago

Hi @NickCraver

When you mentioned MGET can add up, may I know what does this mean? Like if there are tons of MGET going at the same time it will likly stackup the time? How do we check if this is the case? Our redis metric looks healthy during the load test.

Thank you!

NickCraver commented 7 months ago

Overall, that's grabbing a 1.3 MB+ payload per call (going by your minimums) - that adds up quite a bit if bandwidth starved. I'd suggest either fetching in smaller batches, or letting the multiplexer go for you getting each key ASAP with an async call depending on how much of a load vs. latency increase that is.

The main problem here looks to be inbound payload, maybe coming back as one huge batch with a delay, which is very spikey vs. smooth pipeline and we want the latter. And of course we recommend upgrading, so I hope you're able to remove those blockers :)