StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.85k stars 1.5k forks source link

Transaction task never completes after network outage #2630

Closed jlao closed 5 months ago

jlao commented 6 months ago

We had a short network outage and after it recovered, we noticed in logs that a background thread that talks to Redis wasn't emitting logs anymore. When I took a process dump and found that it was stuck await a Redis transaction (await transaction.ExecuteAsync()).

From looking through past issues it sounds a lot like #2412 which was fixed in v2.6.104. The problem is... We were running v2.6.104. Here's a log from the mux to prove it:

SocketFailure (ReadSocketError/TimedOut, last-recv: 20) on 10.10.70.231:6003/Subscription, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 79s ago, last-write: 18s ago, unanswered-write: 18s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 8 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.6.104.40210

The other noteworthy thing is that that there was a null ref from inside the library which I mentioned in #2576.

Any ideas or additional information I can provide?

NickCraver commented 6 months ago

In this, I imagine the null ref prevented the task completion, which will be resolved in the next release (fix merged this morning). I have a lot on the plate but will try and cut that maybe tomorrow. It's available on MyGet right now (see README) if you need the new version immediately.

jlao commented 6 months ago

Thanks for the quick response! I'll keep an eye for that.

The one last thing I will mention is that the tasks from the operations created off the transaction all seem to be cancelled. It seems odd to me that those would be cancelled but not the task for the transaction itself. Does it seem reasonable that a null ref could leave it in a state like that?

NickCraver commented 5 months ago

Heads up: 2.7.17 is out on NuGet :)

NickCraver commented 5 months ago

I'm going to close this out to tidy up as we believe this to be resolved, but if not after 2.7.17 please comment and we'll re-open.

jlao commented 3 months ago

Hi again,

We saw another instance where a Redis transaction is hung forever after a connection blip. The version we are using is 2.7.17.27058.

In the logs, I see this sequence of events:

07:20:22.719 - <log line right before we do the transaction>
07:20:22.990 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 4) on 10.10.70.233:6001/Interactive, Flushed/Faulted, last: GET, origin: ReadFromPipe, outstanding: 2, last-read: 4s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:23.012 - ConnectionRestored with args: 10.10.70.233:6001
07:20:26.558 - ConfigurationChanged with endpoint: 10.10.70.233:6001
07:20:30.259 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.232:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.231:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.269 - ConnectionRestored with args: 10.10.70.233:6003
07:20:30.269 - ConnectionRestored with args: 10.10.70.232:6002
07:20:30.269 - ConnectionRestored with args: 10.10.70.231:6003
07:20:30.270 - ConnectionRestored with args: 10.10.70.232:6001
07:20:30.270 - ConnectionRestored with args: 10.10.70.233:6002
07:20:30.275 - StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
     ---> Pipelines.Sockets.Unofficial.ConnectionResetException: An existing connection was forcibly closed by the remote host.
     ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
       at Pipelines.Sockets.Unofficial.Internal.Throw.Socket(Int32 errorCode) in /_/src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 59
       at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.GetResult() in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 74
07:20:30.292 - ConfigurationChanged with endpoint: 10.10.70.232:6001

This is what the transaction code looks like:

ITransaction transaction = Database.CreateTransaction();
transaction.AddCondition(
    Condition.StringEqual(key1, 124));

Task<bool> task1 = transaction.KeyExpireAsync(
    key1,
    TimeSpan.FromMinutes(30));

Task<bool> task2 = transaction.KeyExpireAsync(
    key2,
    TimeSpan.FromMinutes(30));

Task<bool> task3 = transaction.KeyExpireAsync(
    key3,
    TimeSpan.FromMinutes(30));

Task<bool> task4 = transaction.KeyExpireAsync(
    key4,
    TimeSpan.FromMinutes(30));

if (!await transaction.ExecuteAsync())
{
    throw new Exception();
}

bool exp1 = await task1;
bool exp2 = await task2;
bool exp3 = await task3;
bool exp4 = await task4;

_logger.Verbose("{0} | {1} | {2} | {3}", exp1, exp2, exp3, exp4);

I took a process dump and when I open it I see:

Some notes:

jlao commented 3 months ago

@NickCraver any thoughts?