StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.87k stars 1.51k forks source link

Timeout error while performing INCR and MGET #2545

Open ranpariyachetan opened 1 year ago

ranpariyachetan commented 1 year ago

Hi,

I am using StackExchange.Redis library in my web application. Following is the setup.

Following are a few examples of errors.

Operation: MultiGet : KeyCount :2; Message: Timeout performing MGET (5000ms), next: MGET {SomeKey}:0449-0450, inst: 4, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: id-redis-rl-main-cache-0002-002.id-redis-rl-main-cache.u8dlwf.use1.cache.amazonaws.com:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: IP-0AC62558(SE.Redis-v2.6.66.47313), PerfCounterHelperkeyHashSlot: 12665, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=13,Free=32754,Min=8,Max=32767), v: 2.6.66.47313 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Operation: Increment; Message: Timeout performing INCRBYFLOAT (5000ms), next: INCRBYFLOAT {SomeKey}:0450-0451, inst: 4, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: id-redis-rl-main-cache-0002-001.id-redis-rl-main-cache.u8dlwf.use1.cache.amazonaws.com:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: IP-0AC62558(SE.Redis-v2.6.66.47313), PerfCounterHelperkeyHashSlot: 12665, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=20,Free=32747,Min=8,Max=32767), v: 2.6.66.47313 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Operation: Increment; Message: Timeout performing INCRBYFLOAT (5000ms), next: INCRBYFLOAT {somekey}:0614-0615, inst: 10, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: id-redis-rl-main-cache-0005-001.id-redis-rl-main-cache.my7vre.use1.cache.amazonaws.com:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: IP-0AC2B294(SE.Redis-v2.6.66.47313), PerfCounterHelperkeyHashSlot: 15309, IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=26,Free=32741,Min=8,Max=32767), v: 2.6.66.47313 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Around 30 such errors reported at after application started to use Redis. Since then no timeout errors reported. As of now I am using Redis only to perform Rate-Limit hence only INCR, EXPIRE and MGET are the only operations being performed. But I am planning to use the same Redis cluster to store business objects too which will have higher throughput of cache operations compared to the Rate-Limit usage.

I am really surprised to see the Timeout error for INCR and EXPIRE operations. Even I am using CommandFlags.FireAndForge to perform EPXPIRE after INCR.

var result = RedisDatabase.StringIncrement(key, delta, CommandFlags.FireAndForget); RedisDatabase.KeyExpire(key, validFor, CommandFlags.PreferMaster & CommandFlags.FireAndForget);

var result = RedisDatabase.StringGet(redisKeys, CommandFlags.PreferReplica);

I want to understand what could be the reason of timeout issues and fix them before I expand the usage of Redis for business object.

99% of the solutions online suggest to setup proper Worker Pool etc. But I wonder if that can be the only reason of this issue. I am interested to know what other factors can contribute to this such as Application throughput, EC2 instance size, Version of .NET Framework, size of cache key and/or cache data.

I will be also helpful to know if there is any formula or technique to know the optimal number of worker threads based on server size, number of cores, number of shards in the Redis cluster etc.

I would also like to understand why these simple operations take more than 2ms 30% of the times. image

I am really looking forward to hints/approaches to understand and solve this issue. I can share more information about the issue I am facing if needed.

Thanks.

ranpariyachetan commented 11 months ago

Update.

I changed the WorkerThread minimum count as per the suggestion using following code in start of the application.

ThreadPool.SetMinThreads(Environment.ProcessorCount * 25, 8);

My application is running on 8 CPU machine, hence total value of minimum worker threads becomes 200 at runtime.

With the above change, TimeOut errors are reduced at a great extent not reduced to zero. The strange part is the Timeout errors does not show they symptoms of worker thread starvation.

Example of latest Timeout error is as follow.

Operation: Get; Message: Timeout performing GET (5000ms), next: GET {somekey}, inst: 0, qu: 0, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: id-redis-rl-main-cache-0005-002.id-redis-rl-main-cache.u8dlwf.use1.cache.amazonaws.com:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: IP-0AC624E3(SE.Redis-v2.6.66.47313), PerfCounterHelperkeyHashSlot: 11274, IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=9,Free=32758,Min=200,Max=32767), v: 2.6.66.47313 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts

In the above message Busy workers (9) are much less than the minimum available workers (200). So Redis operation is timed out even though there are enough workers available.

Any explanation for this behavior?