dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.6k stars 119 forks source link

[bug] AsyncAutoResetEvent deadlocks #82

Closed jasonswearingen closed 2 years ago

jasonswearingen commented 2 years ago

Run this test. On my computer, it deadlocks after about 6 seconds

    [TestMethod]
    public async Task DotNextAsyncAutoResetEvent()
    {
        DotNext.Threading.AsyncAutoResetEvent autoResetEvent = new(false);

        var loopCount = 0;
        var setCount = 0;
        var consumedCount = 0;
        var timer =Stopwatch.StartNew();
        var lastSecondReported = 0;
        var producerTask = Task.Run(() => {

            while (true)
            {
                loopCount++;
                var didSet = autoResetEvent.Set();
                if (didSet)
                {
                    setCount++;
                }

                if (timer.Elapsed > TimeSpan.FromSeconds(lastSecondReported))
                {
                    var tup1 = new { loopCount };
                    var tup = new { loopCount, setCount, consumedCount };
                    Console.WriteLine($"t={lastSecondReported}sec:  {new { loopCount, setCount, consumedCount }}");
                    lastSecondReported++;
                }

                if (timer.Elapsed > TimeSpan.FromSeconds(30))
                {
                    break;
                }
            }
        });

        var consumerTask = Task.Run(async () => {

            while (true)
            {
                var success = await autoResetEvent.WaitAsync(TimeSpan.FromMilliseconds(1));
                if (success)
                {
                    consumedCount++;
                }
                if (producerTask.IsCompleted)
                {
                    break;
                }
            }
        });

        await producerTask;
        autoResetEvent.Set();
        await consumerTask;
    }

using DotNext 4.0.0-beta.8 from Nuget

jasonswearingen commented 2 years ago

using await autoResetEvent.WaitAsync(); instead of the TimSpan overload does not seem to deadlock

sakno commented 2 years ago

@jasonswearingen , thanks for the report! Regardless of potential bug, I'm trying to figure out what happening in your test. I would appreciate if you can help me with that. I see the following:

  1. Logical thread trying to set the event to signaled state during 30 seconds (Producer thread)
  2. Another thread trying to wait and reset the event automatically (Consumer thread). If Producer thread is finished, then finalize Consumer thread as well
  3. The test method waits for Producer thread only

I see that Producer thread has no async code (no await operators). Moreover, the test waits for Producer completion only. So I'm not sure how AsyncAutoResetEvent can block the entire test. Even if it is bugged, it's not observable by the test. I think, you miss await consumerTask.

jasonswearingen commented 2 years ago

yes you are correct regarding the test workflow.

And yes, there is a small bug in the test that would leave the consumer task stalled if the deadlock did not exist. The last line should have:

await producerTask;
autoResetEvent.Set();

I have updated the test code accordingly.

sakno commented 2 years ago

Also, there is an interesting bug in .NET itself that was fixed: dotnet/runtime#60182. It is observable for very small timeouts only. 1 milliseconds is enough to blow up on that mine. Unfortunately, the fix is not included in .NET 6 RC2. Timeout tracking in all asynchronous primitives rely on CancellationTokenSource and its ability to reset the state. The bug unpredictably crashes the internal state of DotNext.Threading.Tasks.ValueTaskCompletionSource<T> that might be a source of deadlock.

sakno commented 2 years ago

@jasonswearingen , maybe

await producerTask;
autoResetEvent.Set();
await consumerTask;

?

jasonswearingen commented 2 years ago

sure that would be better :)

btw when I ran it, the producer thread is hung inside the .Set() call, and further up the callstack it is stuck inside the ValueTaskCompletionSource. So your theory seems valid.

sakno commented 2 years ago

Unfortunately, we have to wait for final release of .NET 6 and then check again.

sakno commented 2 years ago

@jasonswearingen , .NEXT RC1 has been released. It uses GA version of .NET 6 so you can try tests again.

jasonswearingen commented 2 years ago

The bug is still there. deadlocks in same location. tested with dotnet6.0 release and dotNext 4.0.0-rc.1

sakno commented 2 years ago

I found the root cause:

  1. Thread # 1: ManualResetCompletionSource.CancellationRequested is trying to enter monitor infinitely
  2. Thread # 2: ManualResetCompletionSource.OnCompleted is trying to enter monitor infinitely
  3. Thread # 3: ManualResetCompletionSource.StopTrackingCancellation stuck on timeoutTracker.Dispose() method

timeoutTracker is of type CancellationTokenRegistration. I think that Dispose waits for completion of cancellation callback. In the same time, cancellation callback stuck on Thread # 1. So Dispose is called within monitor and cancellation callback too. As a result, we observe a dead lock.

sakno commented 2 years ago

According to sources in .NET repo, Dispose call is blocking while Unregister not.

sakno commented 2 years ago

@jasonswearingen , many thanks for reporting this bug! The potential fix is in develop branch. The related commit is specified above.