StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.89k stars 1.51k forks source link

getting timeouts on redis server intermittently. #2538

Open Ekim1987 opened 1 year ago

Ekim1987 commented 1 year ago

We recently did a update and get a timeout error it happens every now and then (intermittently) not sure what is causing this. We had our dev ops check the pc and everything looks fine on the server. we also restart the server and still get this error.

On Portainer we get the follow error:

Timeout performing GET (5000ms), next: GET EFUnknownsCacheDependency, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 5, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=14,Free=32753,Min=4,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) VMGOnline.SharedLibrary.Contracts.Exceptions.ApiGeneralException: Timeout performing GET (5000ms), next: GET EFUnknownsCacheDependency, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 5, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=14,Free=32753,Min=4,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) ---> StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), next: GET EFUnknownsCacheDependency, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 5, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=14,Free=32753,Min=4,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2624 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54 at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 2374`

NickCraver commented 1 year ago

I would recommend first upgrading to the latest version - there have been a lot of tweaks in the last 3 years and much better information added to the errors to help diagnose.

Ekim1987 commented 1 year ago

Hi @NickCraver

Thanks for the speedy reply!

Ive just updated redis to 7.2 and the MinThreads (in runTime config to 50) and still getting these errors:

Timeout performing GET (5000ms), next: UNLINK EFUnknownsCacheDependency, inst: 17, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 9, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=50,Free=32717,Min=50,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) VMGOnline.SharedLibrary.Contracts.Exceptions.ApiGeneralException: Timeout performing GET (5000ms), next: UNLINK EFUnknownsCacheDependency, inst: 17, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 9, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=50,Free=32717,Min=50,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) ---> StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), next: UNLINK EFUnknownsCacheDependency, inst: 17, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 9, in-pipe: 0, out-pipe: 0, serverEndpoint: redis-server:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 6025cb594dc5, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=50,Free=32717,Min=50,Max=32767), v: 2.1.28.64774 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2624 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54 at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 2374 at EasyCaching.Redis.DefaultRedisCachingProvider.BaseGet[T](String cacheKey) at EasyCaching.Core.EasyCachingAbstractProvider.Get[T](String cacheKey) at EFCoreSecondLevelCacheInterceptor.EFEasyCachingCoreProvider.clearDependencyValues(String rootCacheKey) at EFCoreSecondLevelCacheInterceptor.EFEasyCachingCoreProvider.<>c__DisplayClass8_0.b__0() at EFCoreSecondLevelCacheInterceptor.ReaderWriterLockProvider.TryWriteLocked(Action action, Int32 timeout) at EFCoreSecondLevelCacheInterceptor.EFCacheDependenciesProcessor.InvalidateCacheDependencies(DbCommand command, DbContext context, EFCachePolicy cachePolicy) at EFCoreSecondLevelCacheInterceptor.DbCommandInterceptorProcessor.ProcessExecutedCommands[T](DbCommand command, DbContext context, T result) at EFCoreSecondLevelCacheInterceptor.SecondLevelCacheInterceptor.NonQueryExecutedAsync(DbCommand command, CommandExecutedEventData eventData, Int32 result, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Diagnostics.RelationalLoggerExtensions.CommandNonQueryExecutedAsync(IDiagnosticsLogger1 diagnostics, IRelationalConnection connection, DbCommand command, DbContext context, Guid commandId, Guid connectionId, Int32 methodResult, DateTimeOffset startTime, TimeSpan duration, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQueryAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQueryAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQueryAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlRawAsync(DatabaseFacade databaseFacade, String sql, IEnumerable1 parameters, CancellationToken cancellationToken) at VMGOnline.SharedLibrary.EFCore.Services.SqlBulkOperations.Helpers.PostgreSqlUpdateBulkService1.UpdateFromTempTableAsync(QueryDefinition1 queryDefinitions) at VMGOnline.SharedLibrary.EFCore.Services.SqlBulkOperations.Helpers.PostgreSqlUpdateBulkService1.ExecuteAsync(IEnumerable1 entities, BulkOperationOptions options) --- End of inner exception stack trace --- at VMGOnline.SharedLibrary.EFCore.Services.SqlBulkOperations.Helpers.PostgreSqlUpdateBulkService1.ExecuteAsync(IEnumerable1 entities, BulkOperationOptions options) at VMGOnline.SharedLibrary.EFCore.Services.SqlBulkOperations.SqlBulkOperationsService1.UpdateAsync(IReadOnlyCollection1 entities, BulkOperationOptions options)`

NickCraver commented 1 year ago

@Ekim1987 Sorry I should have been clearer there, please upgrade to the latest version of StackExchange.Redis - you're using a quite old version at this point and a lot of improvements have been made :)

vigneshmsft commented 1 year ago

We have been getting these timeouts as well for a quite while and have not managed to find the rootcause. I have configured the Min and Max Thread counts to an appropriate number and also use a privatelink connection to the redis service from the AppService. Any pointers to fully resolve this would be highly appreciated.

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=32KiB, 2012ms elapsed, timeout is 2000ms), command=HMGET, next: HMGET 1cf27fa332b6fbb050b8, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 1, async-ops: 21293486, serverEndpoint: ****.redis.cache.windows.net:6380, conn-sec: 357587.14, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: 23601c5fbc80(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=400,Max=1000), WORKER: (Busy=4,Free=32763,Min=400,Max=32767), POOL: (Threads=14,QueuedItems=48,CompletedItems=167975059,Timers=38), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

vigneshmsft commented 1 year ago

WORKER: (Busy=50,Free=32717,Min=50,Max=32767)

@Ekim1987 appears you'll benefit from increasing the min Thread count to a higher number likely 100 or more to suit your traffic, as it appears it has hit the 50 thread limit and there might be timeout related to spinning up new threads. and also as Nick suggested better off upgrading to a more recent version.

NickCraver commented 1 year ago

@vigneshmsft How big is the HMGET there? It seems like you're getting a large payload perhaps, and that's exceeding your 2 second window - we see inbound but not enough to complete. I'd check SLOWLOG on the server but also just assess payload sizes on the multi get op.

vigneshmsft commented 1 year ago

@NickCraver the actual payload should be roughly 150-300 bytes max, also worth mentioning that I use the Microsoft.Extensions.Caching.StackExchangeRedis abstraction which underneath uses a hashmap instead of simple keys to store the data and absolute/sliding expiration timestamps.

Had one again this morning, there was no corresponding entry in the SLOWLOG, the nearest entry was about 30 mins before/after the timeout.

NickCraver commented 1 year ago

@vigneshmsft how sure are you about that payload size? Asking because I see a 32KB inbound buffer which means ~100x of a 300 byte payload but we only see 1 outstanding command in the queues, not many. That indicated one large payload which doesn't match the description in play. Perhaps something not quite expected is being fetched here.

slorello89 commented 1 year ago

@NickCraver - worth noting that 32KiB could just be the buffer limit (I get suspicious when log2(inbound) is an integer) - so that could just be the tip of the iceberg - as you said, sounds like a rather large payload and a timeout/payload/bandwidth mismatch.

slorello89 commented 1 year ago

@vigneshmsft - you can check your Redis instance for large keys with:

redis-cli --bigkeys

and

redis-cli --memkeys

vigneshmsft commented 1 year ago

redis-cli --bigkeys it appears I have to enable the non-SSL port for the redis-cli to connect to the azure redis instance, I have to look at ways to do this for our prod instance.

@NickCraver I am pretty sure about our payload size, it should not exceed 300 bytes, even 300 bytes is on the top end for this particular key. It should usually be in the 70-150 bytes range for this key type which is timing out.

slorello89 commented 1 year ago

FWIW the redis-cli can be TLS enabled (granted enabling the non-SSL port might just be easier)

NickCraver commented 1 year ago

@vigneshmsft The buffer doesn't match what you expect to happen, remember anything in the pipe may be causing that buffer expansion, not just the command you're currently awaiting. Because it's a multiplexed pipeline, that's a shared piece of flow.

vigneshmsft commented 1 year ago

this is what I got before the connection timedout, I didn't want to further risk running the command on a Friday evening on a production system.

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest hash   found so far '2dafc44b65652db2e6b1' with 3 fields // this same keytype as the one which timedout
slorello89 commented 1 year ago

@vigneshmsft - how about --memkeys?