StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.88k stars 1.51k forks source link

Backlog growing but nothing sent #2559

Closed AqlaSolutions closed 9 months ago

AqlaSolutions commented 1 year ago

We are trying to benchmark our app which tries to do 80000 StringSet + HashSet fire and forget requests per second. After writing 667758 records (each time around 600k) nothing is being added to Redis anymore (as RedisInsight shows) and PhysicalBridge backlog queue is growing until out of memory. image

The debugger shows that the backlog is in the flushing state and there is a task awaiting in the PhysicalConnection.FlushAsync_Awaited. The send loop of SocketConnection is awaiting at the selected line: image

There is enough memory available for Redis. I tried to install Redis to WSL2 directly without docker but achieved the same result.

Any help?

NickCraver commented 1 year ago

We would need to see your benchmark code here to advise. All I can see at the moment is 19 million messages queued up in your debugger but anything we'd say is a total guess - if you add what code is doing this, we can give a better idea.

AqlaSolutions commented 1 year ago

I changed FireAndForget to normal. Here is the code which should be run 80000 times per second:

        SomeRecord ev = new ...
        SomeRecord2 ev2 = new ...
        TimeSpan expiry = TimeSpan.FromMinutes(60);
        RedisKey IndexKey = ...
        Guid id = Guid.NewGuid();
        IDatabase _db = ...
        IDatabase _index = _db.WithKeyPrefix(...);

        // ForgotTask() - awaits and logs exception

        using (var ms = new MemoryStream())
        {
            Serializer.Serialize(ms, ev);
            var idBytes = id.ToByteArray();
            _db.StringSetAsync(idBytes, ms.ToArray(), expiry, false, When.Always).ForgotTask();

            ms.SetLength(0);
            Serializer.Serialize(ms, ev2);
            _index.HashSetAsync(IndexKey, idBytes, ms.ToArray(), When.Always).ForgotTask();
        }

This logs:

RedisTimeoutException: Timeout awaiting response 

 (outbound=6KiB, inbound=7KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 2, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1463, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=126,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=94,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=90,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=91,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=14KiB, inbound=0KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 0, qs: 85, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 7900, in-pipe: 0, out-pipe: 0, last-in: 1, cur-in: 0, sync-ops: 1460, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=85,CompletedItems=101956,Timers=414), v: 2.6.122.38350 (Please take a look at 
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=93,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 0, qs: 85, aw: False, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1461, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=86,CompletedItems=101975,Timers=413), v: 2.6.122.38350 (Please take a
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=100,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=101,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=103,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=102,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=97,CompletedItems=101976,Timers=414), v: 2.6.122.38350 (
 (outbound=14KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=96,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=HSET, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 1, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1462, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=104,CompletedItems=101976,Timers=414), v: 2.6.122.38350 
 (outbound=6KiB, inbound=8KiB, 5016ms elapsed, timeout is 5000ms), command=SETEX, next: HSET redacted:index:rd:1316:meta, inst: 0, qu: 2, qs: 85, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 7953, out-pipe: 0, last-in: 1, cur-in: 5310, sync-ops: 1463, async-ops: 5553, serverEndpoint: localhost:6379, conn-sec: 265.09, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=46,Free=32721,Min=12,Max=32767), POOL: (Threads=61,QueuedItems=105,CompletedItems=101976,Timers=414), v: 2.6.122.38350

And later

(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870790, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2377910, serverEndpoint: localhost:6379, conn-sec: 454.98, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=113,CompletedItems=9705335,Timers=6130), v: 2.6.122.38350 (
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870792, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2377912, serverEndpoint: localhost:6379, conn-sec: 454.98, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=114,CompletedItems=9705336,Timers=6130), v: 2.6.122.38350 (
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=HSET, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870792, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2377912, serverEndpoint: localhost:6379, conn-sec: 454.98, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=115,CompletedItems=9705336,Timers=6130), v: 2.6.122.38350 (P
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=HSET, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870794, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2377914, serverEndpoint: localhost:6379, conn-sec: 454.98, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=114,CompletedItems=9705338,Timers=6130), v: 2.6.122.38350 (P
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=HSET, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870898, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2378026, serverEndpoint: localhost:6379, conn-sec: 454.99, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=10,Free=32757,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=137,CompletedItems=9705523,Timers=6130), v: 2.6.122.38350 (
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=HSET, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870794, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2377916, serverEndpoint: localhost:6379, conn-sec: 454.98, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=116,CompletedItems=9705339,Timers=6130), v: 2.6.122.38350 (P
(outbound=0KiB, inbound=0KiB, 18766ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX redacted:book:31I_DDS:пїЅпїЅпїЅ\u0001\u0004\nпїЅe)UMпїЅпїЅпїЅпїЅz, inst: 0, qu: 870910, qs: 142617, aw: True, bw: Flushing, rs: ReadAsync, ws: Flushing, in: 0, in-pipe: 0, out-pipe: 524401, last-in: 1, cur-in: 0, sync-ops: 103238, async-ops: 2378030, serverEndpoint: localhost:6379, conn-sec: 454.99, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: (SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=11,Free=32756,Min=12,Max=32767), POOL: (Threads=101,QueuedItems=139,CompletedItems=9705547,Timers=6130), v: 2.6.122.38350 

(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) 

There is no SynchronizationContext.

NickCraver commented 1 year ago

@AqlaSolutions We'd need a full repro here - from the exceptions I see for example sync-ops: 103238, async-ops: 2378030 which indicates you also have synchronous operations (which aren't covered by that code), but also you're creating a ton of tasks here and exhausting the thread pool with your (I'm assuming) continuation approach.

We can't reasonably pry information piece by piece to help here, but if you have a small full repro that has the issue, then we could advise.

AqlaSolutions commented 1 year ago

Here is the full repro:

using System.Diagnostics;
using StackExchange.Redis;
using StackExchange.Redis.KeyspaceIsolation;

Task.Run(async () =>
{
    var sm = new SemaphoreSlim(10, 10);
    while (true)
    {
        await sm.WaitAsync();
        Task.Run(() =>
        {
            // simulate work that actively uses ThreadPool
            var sw = Stopwatch.StartNew();
            while (sw.Elapsed.TotalMilliseconds < 50)
            {
                _ = Environment.TickCount.GetHashCode();
            }

            sm.Release();
        }).ForgotTask();
    }
}).ForgotTask();

var mp = ConnectionMultiplexer.Connect("localhost:6379");
var db = mp.GetDatabase();
var index = db.WithKeyPrefix("index:");
var sm = new SemaphoreSlim(500, 500);
while (true)
{
    await sm.WaitAsync();
    Task.Run(async () =>
        {
            var idBytes = Guid.NewGuid().ToByteArray();
            db.StringSetAsync(idBytes, new byte[1024 * 5], TimeSpan.FromHours(1), false, When.Always).ForgotTask();
            index.HashSetAsync("meta", idBytes, new byte[1024], When.Always).ForgotTask();
            await Task.Delay(15);
            sm.Release();
        })
        .ForgotTask();
}

public static class MyExtensions
{
    public static async void ForgotTask(this Task task)
    {
        try
        {
            await task;
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
        }
    }
}

It doesn't represent exact load type but still it yields the same situation where after some time nothing gets written to redis at all.

NickCraver commented 1 year ago

@AqlaSolutions Overall: the problems are in the benchmark which is, to be blunt, completely murdering the thread pool and underlying CPU at the same time. The test running is "what happens if we already peg 10 cores?" which is a test, but I'm not sure what it's accomplishing there, or on what hardware. Anything less than 11 cores (minimum) is going to fall over pretty instantly and it's also allocating like crazy with the arrays, so that's going to cause a ton of GC pauses/issues.

If we take this down to "what is the client doing?" (which is what we can help with), it looks something like this:

void Main()
{
    var mp = ConnectionMultiplexer.Connect("localhost:6379");
    var db = mp.GetDatabase();
    var index = db.WithKeyPrefix("index:");

    var sw = Stopwatch.StartNew();
    long count = 0;
    var stringVal = new byte[1024 * 5];
    var hashVal = new byte[1024];

    Parallel.ForEach(Enumerable.Range(1, 8), _ =>
    {
        while (count < 20_000_000)
        {
            var idBytes = Guid.NewGuid().ToByteArray();
            db.StringSet(idBytes, stringVal, TimeSpan.FromHours(1), false, When.Always, CommandFlags.FireAndForget);
            index.HashSet("meta", idBytes, hashVal, When.Always, CommandFlags.FireAndForget);
            var newCount = Interlocked.Increment(ref count);
            if (newCount % 1_000_000 == 0)
            {
                Console.WriteLine(DateTime.UtcNow.ToString("u") + ": " + newCount);
            }
        }
    });
    sw.Stop();
    Console.WriteLine("Total time (ms): " + sw.Elapsed.TotalMilliseconds);
    Console.WriteLine("Per second: " + (double)count / sw.Elapsed.TotalSeconds);
}

That's doing 20 million of your operations but with

This is a quick run on my laptop (Ryzen 9 5900HX - 8C/16T):

2023-09-23 13:36:28Z: 1000000 2023-09-23 13:36:29Z: 2000000 2023-09-23 13:36:30Z: 3000000 2023-09-23 13:36:31Z: 4000000 2023-09-23 13:36:31Z: 5000000 2023-09-23 13:36:32Z: 6000000 2023-09-23 13:36:33Z: 7000000 2023-09-23 13:36:34Z: 8000000 2023-09-23 13:36:34Z: 9000000 2023-09-23 13:36:35Z: 10000000 2023-09-23 13:36:36Z: 11000000 2023-09-23 13:36:36Z: 12000000 2023-09-23 13:36:37Z: 13000000 2023-09-23 13:36:38Z: 14000000 2023-09-23 13:36:39Z: 15000000 2023-09-23 13:36:40Z: 16000000 2023-09-23 13:36:41Z: 17000000 2023-09-23 13:36:41Z: 18000000 2023-09-23 13:36:42Z: 19000000 2023-09-23 13:36:43Z: 20000000 Total time (ms): 15393.5377 Per second: 1299246.956078199

Notes:

Overall: I don't see a client issue in the benchmark, but there are lots of problems that will overwhelm a system using Tasks like that. Your issue is more to do with .NET and processor utilization patterns.

AqlaSolutions commented 1 year ago

We use ProtoActor framework which is heavily based on tasks and gRPC serialization. And yes, GC pauses are pretty bad. You are right that these conditions affect processing time and timeouts. I don't mind that it works slower under load. But the main issue is that after some time no requests reach Redis at all. It looks like the client is not sending anything to Redis while waiting for some old response from Redis which didn't arrive in time and was already discarded.

I think that GC pauses and ThreadPool starvation break some internal state of the client so it doesn't send any requests anymore.

NickCraver commented 1 year ago

When your system is overloaded, the literal CPU can do more work: yes, almost everything will timeout, because it doesn't even get to the sending phase before things are past their maximum time. Your benchmark approach pegs the CPU unless you're on a 16C+ system - with that I'm not sure what to say other than "yes, that's really bad - don't do that".

The approach you're taking overall there, besides the inefficiencies of doing so enumerated above, is piling Tasks to a massive degree instantly, rather than in any smooth curve. With the constraints of limited CPU and bandwidth, you're only able to handle a certain quantity per time period before things are going to get less and less far into the pipe before hitting their maximum time.

When you add the problems of the Tasks and GC overhead I enumerated above, it makes all that many times worse. I've demonstrated how to handle it efficiently as intended with fire and forget above - and that's what I'd recommend for your flow as there's tremendously less overhead involved.

Overall, I think you should back up and look at what's happening here more generally. You're getting a timeout. A timeout exists primarily for one reason above all others: it's letting you know the system is overloaded and unable to keep up with the volume of work. That's due to exhaustion of some resource (memory, CPU, bandwidth, etc.). In your approach you're literally not caring about that signal at all. It just keeps piling on tasks indefinitely and yeah...that's going to be trouble. It's just going to accelerate the spiral of death timeouts are a guard against in the first place. My best advice there is: don't do that.

AqlaSolutions commented 1 year ago

Fire and forget by itself doesn't change anything if used in my original benchmark. It was originally made this way, I switched it to tasks only to see exceptions.

In real world application we can't just use 8 threads to write to Redis. We have lots and lots of tasks doing different staff, not just writing to Redis in a loop. There are 100000 concurrent "actors" each executing its own requests queue using ThreadPool and utilizing all cores available. It looks like your client doesn't work well when so many parallel requests are enqueued without any delay on timeouts, in combination with delays under load. In this case we have to add additional queue between app code and redis client that add delays when seeing to many outgoing requests waiting for response. I don't understand why Redis client doesn't do it already so we have to implement that kind of throttling ourselves.

NickCraver commented 1 year ago

@AqlaSolutions What is it you want to accomplish here? I was only correcting the benchmark to demonstrate usage well beyond what you're asking to accomplish, of course it's not just like your application because I have no idea what that is. I'm only demonstrating how to issue the messages, wherever they are coming from on your existing threads. If you're doing more blocking and excessive Task creation outside of that, that's an application problem, rather than a StackExchange.Redis client problem - we handle input from crazy amounts of threads as fast as the pipe can handle it.

Your benchmark is completely ignoring all results, so what kind of throttling do you image is possible here? Any feedback we give would be ignored. We're handling the load to the max already, and when you go past that you may experience timeouts, or you can increase the max timeout if that's the additional buffer behavior you want during spikes - that's totally up to you. It sounds from all of your descriptions there are deeper issues with so many threads and tasks happening though. If you're using as I've demonstrated above, we're not adding to your Task load in most cases - we're just tossing some data on the pipe synchronously (and sending it async) later. You can listen to the error events on the multiplexer to observe any timeouts happening from fire and forgets if that's the mismatch here.

AqlaSolutions commented 1 year ago

Is there a way to limit backlog?

AqlaSolutions commented 1 year ago

I've found a way to limit backlog to 10 items. If there are more, all our requests will be waiting with Task.Delay before using Redis client. After a few minutes I see 0 CPU load, no tasks, no work done. It just hangs waiting for backlog to empty but it's not happening. It looks like there is a bug somewhere in threads synchronization inside Redis client.

I use reflection and expression trees to get BacklogMessagesPending from GetBridgeStatus.

NickCraver commented 1 year ago

@AqlaSolutions I'm not sure how to help here. There are limits to what we can help debug in your application as OSS maintainers - if you have a minimal repro that correctly uses the client and fails like then, super happy to take a look - I've done so above with the original example and explain what was happening/wrong there. If there's a more accurate repro of your application load still doing it we can do that once again, but without more info/data here, I'm not sure how to advise further.