StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.86k stars 1.5k forks source link

Redis timeout issue in .NET Web Api #2587

Closed jimi-pz closed 8 months ago

jimi-pz commented 8 months ago

Hi all,

We're experiencing timeout issues as the load increases toward our API's which uses a Redis instance. As an important note, we're heavily using the pub-sub in Redis where a pub-sub channel is created for almost each HTTP request due to some long-polling requirement.

Could you take a look at it and share your opinions?

Timeout awaiting response (outbound=17KiB, inbound=0KiB, 15020ms elapsed, timeout is 15000ms), command=XADD, next: PUBLISH task_queue:hector:new_tasks_notification_channel, inst: 0, qu: 0, qs: 143, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 48, cur-in: 0, sync-ops: 116424, async-ops: 286883, serverEndpoint: 11.22.3.17:6379, conn-sec: 9278.94, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: hector-deployment-784647f98-jbc4l(SE.Redis-v2.6.104.40210), IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=69,Free=32698,Min=2,Max=32767), POOL: (Threads=69,QueuedItems=832,CompletedItems=2044906), v: 2.6.104.40210 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchangeRedis library version 2.6.104

Redis server info

# Server redis_version:7.0.10 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:fb1c06838ae8ded8 redis_mode:standalone ...

# Clients connected_clients:283 cluster_connections:0 maxclients:10000 client_recent_max_input_buffer:32768 client_recent_max_output_buffer:20504 blocked_clients:0 tracking_clients:0 clients_in_timeout_table:0

# Memory used_memory:410145424 used_memory_human:391.15M used_memory_rss:461250560 used_memory_rss_human:439.88M used_memory_peak:622762848 used_memory_peak_human:593.91M used_memory_peak_perc:65.86% used_memory_overhead:5777120 used_memory_startup:863360 used_memory_dataset:404368304 used_memory_dataset_perc:98.80% allocator_allocated:410423456 allocator_active:447377408 allocator_resident:461381632 total_system_memory:16766758912 total_system_memory_human:15.62G ...

# Stats total_connections_received:11739 total_commands_processed:1734250091 instantaneous_ops_per_sec:1198 total_net_input_bytes:824180360694 total_net_output_bytes:2176599003302 total_net_repl_input_bytes:0 total_net_repl_output_bytes:1474519525660 instantaneous_input_kbps:349.67 instantaneous_output_kbps:1033.43 instantaneous_input_repl_kbps:0.00 instantaneous_output_repl_kbps:565.48 rejected_connections:0 sync_full:2 sync_partial_ok:0 sync_partial_err:0 expired_keys:23612702 expired_stale_perc:3.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:576270 evicted_keys:0 evicted_clients:0 total_eviction_exceeded_time:0 current_eviction_exceeded_time:0 keyspace_hits:1319540934 keyspace_misses:10802597 pubsub_channels:1982 ...

# Replication role:master connected_slaves:2 slave0:ip=... slave1:ip=... ...

# Errorstats errorstat_ERR:count=31018 errorstat_NOAUTH:count=6522 errorstat_NOMASTERLINK:count=101

# Keyspace db0:keys=6390,expires=6255,avg_ttl=856601

supjohn1 commented 8 months ago

This seems similar to the issue on the same version that I opened a bit ago #2565 . I created a test to reproduce a symptom of the issue. Can you take a look and see if it seems the same to hopefully get some more info to the contributors of StackExchange.Redis and maybe get a solution?

https://github.com/StackExchange/StackExchange.Redis/issues/2565

NickCraver commented 8 months ago

Looking at WORKER: (Busy=69,Free=32698,Min=2,Max=32767), POOL: (Threads=69,QueuedItems=832,CompletedItems=2044906), and the sync op count, this definitely looks like thread exhaustion, leading to stalls in your application. You'll want to make async calls anywhere you can for long things like remote IO, otherwise that stack eats threads which exhausts the pools resulting in other work (like the item here) taking too long to get to.

jimi-pz commented 8 months ago

We could not find out the exact reason but making sync calls to async seems helping.