dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.1k stars 4.7k forks source link

BackgroundService appears to create tasks that will never complete #36380

Closed Zenexer closed 2 years ago

Zenexer commented 5 years ago

This line appears to create a task that will most likely never complete: https://github.com/aspnet/Extensions/blob/aa7fa91cfc8f6ff078b020a428bcad71ae7a32ab/src/Hosting/Abstractions/src/BackgroundService.cs#L65

Based on my own understanding of BackgroundService and the description of its use of cancellation in dotnet/extensions#1245, I don't expect the CancellationToken to StartAsync or StopAsync to be marked as cancelled under normal operation. (This is why BackgroundService exists: it provides a more useful CancellationToken to ExecuteAsync than the one provided to StartAsync, which is helpful for tasks that run indefinitely.) That means the task created with this expression is likely to never complete:

Task.Delay(Timeout.Infinite, cancellationToken)

To resolve this, a new CancellationTokenSource could be created for the current scope:

using var stopCts = new CancellationTokenSource();
using var _ = cancellationToken.Register(stopCts.Cancel);

await Task.WhenAny(_executingTask, Task.Delay(Timeout.Infinite, stopCts.Token));

I can submit a PR if this fix makes sense.

analogrelay commented 5 years ago

This line appears to create a task that will most likely never complete:

This basically is just a way to "await" the signalling of a cancellation token. The timeout is never intended to fire.

I don't expect the CancellationToken to StartAsync or StopAsync to be marked as cancelled under normal operation.

The token from IHost.StopAsync is being flowed in to allow the calling code to stop waiting for shutdown. The intent of this code is to block StopAsync until the service finishes shutting down or the caller decides to stop waiting.

That means the task created with this expression is likely to never complete:

It won't complete if the app finishes shutting down on time, but the impact of that is very low since it's just a single entry in the timer queue that never gets fired.

To resolve this, a new CancellationTokenSource could be created for the current scope:

I don't see how that would fix anything... You're still triggering cancellation based on the same token, just in a more complicated way, right? You're cancelling the stopCts when the cancellationToken is cancelled, which is just a round-about way of just using cancellationToken, as far as I can tell.

Zenexer commented 5 years ago

@anurse

I don't see how that would fix anything... You're still triggering cancellation based on the same token, just in a more complicated way, right? You're cancelling the stopCts when the cancellationToken is cancelled, which is just a round-about way of just using cancellationToken, as far as I can tell.

There is a difference, and it revolves around this point:

It won't complete if the app finishes shutting down on time, but the impact of that is very low since it's just a single entry in the timer queue that never gets fired.

It is theoretically possible for the service to be shut down independent of the rest of the program, and for a service to be started and stopped numerous times before the application exits--for example, an application that self-restarts all its services if it encounters an unrecoverable issue. There has been some talk lately of developing applications that actually favor this over attempting to handle every possible exception condition--I'm not saying I think it's ideal, but it is a valid recovery pattern.

In this scenario, my suggested replacement will clean up the task. The existing code will leave an abandoned entry until the application exits. It might not be huge, but it's still a leak, and it only takes a few lines to resolve.

analogrelay commented 5 years ago

In this scenario, my suggested replacement will clean up the task.

I see, because you are disposing this alternate CTS.

It might not be huge, but it's still a leak, and it only takes a few lines to resolve.

Do you have measurements you can share with us? I'm certainly not opposed to taking a fix that has data showing it reduces memory usage.

Zenexer commented 5 years ago

Do you have measurements you can share with us? I'm certainly not opposed to taking a fix that has data showing it reduces memory usage.

I don't, I'm afraid. I assume it's quite minimal. The only time I can see it making a difference is when an application has another bug that causes it to restart frequently, or if a program is designed to start and stop background services as they're needed. If the application remains running for months or years, maybe it would cause some problems.

I'm mostly interested in fixing it because I see BackgroundService used as an example or reference from time to time; it's a simple service implementation that's easy to understand. I'm concerned that people reading and potentially copying the pattern on line 65 won't understand that it's likely to leak a task entry and isn't really correct usage, even if it's unlikely to have a significant impact in this particular scenario.

It may be worth checking whether this same pattern is used elsewhere in official codebases where it could potentially have a more significant impact.

Since it's easy to fix that shouldn't break anything, I'd be happy to submit a PR.

analogrelay commented 5 years ago

FYI: We aren't accepting new PRs for 3.0 right now and there isn't a branch for 3.1 (the next release after 3.0) yet. I would expect one to be up and running sometime next week though. Just want to set the expectations. We're starting to lock down for 3.0's release later in September.

I think a PR would be OK (though would want input from @Tratcher and @davidfowl). I think that using a separate "stop the timeout" token would be preferred rather than relying on the fact that chaining the tokens like you did above means that when the method ends the timeout will cancel (because the CTS is disposed).

using var cancelTimeoutCts = new CancellationTokenSource();
using var linkedSource = CancellationTokenSource.CreateLinkedTokenSource(cancelTimeoutCts.Token, cancellationToken);

await Task.WhenAny(_executingTask, Task.Delay(Timeout.Infinite, linkedSource.Token));
cancelTimeoutCts.Cancel(); // Unconditionally stop the Task.Delay, if it's still running.
Zenexer commented 5 years ago

I just realized I omitted the Cancel call from my example, which is probably why it wasn't clear what the change was supposed to do.

Should that be wrapped in try-finally? It's not unconditional because it won't run if _executingTask throws. Does disposing a CTS actually cancel it?

Zenexer commented 5 years ago

Looks like this would work:

using var cancelTimeoutCts = new CancellationTokenSource(cancellationToken);

try
{
    await Task.WhenAny(_executingTask, Task.Delay(Timeout.Infinite, cancelTimeoutCts.Token));
}
finally
{
    cancelTimeoutCts.Cancel();
}

It doesn't seem to be necessary to pass two tokens to CreateLinkedTokenSource. There's an internal overload that accepts only one token. Although it's not exposed directly since it's marked as internal, it's called if only one token is passed to the public overload. It's basically going to do the same thing as my first example, but with UnsafeRegister instead of Register. I don't think it makes sense to create an extra CTS as in your example, but I may be missing something.

analogrelay commented 5 years ago

Should that be wrapped in try-finally? It's not unconditional because it won't run if _executingTask throws.

Yep, should be in a try-finally.

Does disposing a CTS actually cancel it?

I'm not actually 100% sure :). Clearly I assumed it did by how I read your initial example, but it may not.

It doesn't seem to be necessary to pass two tokens to CreateLinkedTokenSource. There's an internal overload that accepts only one token. Although it's not exposed directly since it's marked as internal, it's called if only one token is passed to the public overload.

That seems reasonable. I think you're right you don't need the other token. Creating a linked source doesn't restrict the ability of the CTS to be cancelled.

analogrelay commented 5 years ago

So basically this then:

using var linkedSource = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);

await Task.WhenAny(_executingTask, Task.Delay(Timeout.Infinite, linkedSource.Token));
linkedSource.Cancel();
Zenexer commented 5 years ago

With a try-finally, yes. As far as I can tell, disposing a CTS doesn't cancel it. Interestingly, it looks like there could still be a leak if cancellationToken is canceled via a timer.

Looking at the source for Task and CancellationTokenSource, it seems awfully convoluted to be using Task.WhenAny and Task.Delay this way; it looks as though it's prone to race conditions, and it utilizes a timer where it really doesn't need to. There's anecdotal evidence scattered around sites like Stack Overflow suggesting that it is possible to crash an application with poorly managed timers and CTSes. Is there a better pattern that can be used? Perhaps it makes more sense to make a proposal for a solution in CoreCLR before continuing here.

Zenexer commented 5 years ago

I tested with the following code:

using System;
using System.Threading;
using System.Threading.Tasks;

namespace TaskTest
{
    public static class Program
    {
        public static async Task Main()
        {
            using var exitCts = new CancellationTokenSource();

            Console.CancelKeyPress += (sender, e) => exitCts.Cancel();

            while (!exitCts.IsCancellationRequested)
            {
                await Test1Async(exitCts.Token);
            }
        }

        private static async Task Test1Async(CancellationToken ct)
        {
            await Task.WhenAny(Task.CompletedTask, Task.Delay(Timeout.Infinite, ct));
        }

        private static async Task Test2Async(CancellationToken ct)
        {
            using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
            await Task.WhenAny(Task.CompletedTask, Task.Delay(Timeout.Infinite, linkedCts.Token));
        }

        private static async Task Test3Async(CancellationToken ct)
        {
            using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct);

            try
            {
                await Task.WhenAny(Task.CompletedTask, Task.Delay(Timeout.Infinite, linkedCts.Token));
            }
            finally
            {
                linkedCts.Cancel();
            }
        }
    }
}

Thus, if StopAsync is passed a cancellation token that can be cancelled, it will leak. It won't be significant unless it happens a lot, but it could add up, especially if this same pattern is used elsewhere.

Relying on CancellationTokenSource.Dispose without calling Cancel appears to be sufficient. I don't understand why this is the case, so I can't determine whether it's ideal.

Zenexer commented 5 years ago

A quick search reveals that this pattern is widespread, despite leaking: https://github.com/search?q=%22Task.WhenAny%22+%22Task.Delay(Timeout.Infinite%22&type=Code

Edit: A good number of those seem to be forks containing BackgroundService.cs, but there are also a lot of non-forks that contain classes based heavily on BackgroundService, so it is being used as an example and should have correct form.

analogrelay commented 5 years ago
  • Test 1 causes a very fast, linear increase in memory usage (on my Surface Book, about 20 MiB/sec).

Let's also remember the context. This is during shutdown and you wouldn't expect a huge number of background services. I don't disagree that a leak is bad, but 20MiB/sec when in a tight loop like this may translate to a million or more calls to Task.Delay which isn't really a close analog to the real situation.

Running a similar sample myself indicated closer to 125 bytes per invocation, which is still worth fixing, but unless the user has thousands (or more) of BackgroundService instances it won't get up to 20MiB.

Test 3 seems the most effective way of those options.

Another option to evaluate would be to use a TaskCompletionSource, like so:

// Pseudo-code not run through a compiler ;)
using var tcs = new TaskCompletionSource<object>();
using var registration = cancellationToken.Register(() => cts.TrySetCompleted(null));
await Task.WhenAny(_exitTask, tcs.Task);

Basically, instead of using Task.Delay(Timeout.Infinite, ct) as a hack to await a CTS, just await a CTS "directly" by having it signal a TCS.

Have you considered running these through BenchmarkDotNet to get more statistically-accurate data?

Zenexer commented 5 years ago

Let's also remember the context. This is during shutdown and you wouldn't expect a huge number of background services. I don't disagree that a leak is bad, but 20MiB/sec when in a tight loop like this may translate to a million or more calls to Task.Delay which isn't really a close analog to the real situation.

It's certainly not an issue in the most common use case of BackgroundService itself. I see this mostly being an issue when people copy the pattern into code that doesn't just run on shutdown.

Have you considered running these through BenchmarkDotNet to get more statistically-accurate data?

I ran my tests while without internet connectivity, so I worked from scratch. I'm not sure that obtaining accurate numbers is important here, though; it's either leaking, or it's not. We're not really going for speed, rather trying to minimize race conditions that could result in a leak. In that regard, a simple loop for testing is probably a better choice.

Basically, instead of using Task.Delay(Timeout.Infinite, ct) as a hack to await a CTS, just await a CTS "directly" by having it signal a TCS.

That seems to work:

        private static async Task Test4Async(CancellationToken ct)
        {
            var tcs = new TaskCompletionSource<object>();
            using var registration = ct.Register(() => tcs.SetCanceled());
            await Task.WhenAny(Task.CompletedTask, tcs.Task);
        }
analogrelay commented 5 years ago

it's either leaking, or it's not

Fair point.

That seems to work:

Sounds good. Probably the best option since it's clear as to what's happening.

Zenexer commented 5 years ago

I'm now using the following extensions in my code:

public static class TaskExtension
{
    public static async Task WithCancellation(this Task task, CancellationToken ct)
    {
        var tcs = new TaskCompletionSource<object>();
        using var registration = ct.Register(() => tcs.SetCanceled());
        await await Task.WhenAny(task, tcs.Task);
    }

    public static async Task<T> WithCancellation<T>(this Task<T> task, CancellationToken ct)
    {
        var tcs = new TaskCompletionSource<T>();
        using var registration = ct.Register(() => tcs.SetCanceled());
        return await await Task.WhenAny(task, tcs.Task);
    }
}

Would it be worth proposing to add this to corefx? The caveat is that if ct is cancelled, WithCancellation will throw an appropriate exception, but task will continue running. This may not be what people expect, so it could encourage misuse, resulting in hard-to-pinpoint bugs. It might be possible to mitigate this with a more descriptive name for the methods.

analogrelay commented 5 years ago

Would it be worth proposing to add this to corefx?

Like this? https://github.com/dotnet/corefx/issues/33007 :)

analogrelay commented 5 years ago

In fact, you may want to consider using the existing implementation in Microsoft.VisualStudio.Threading. Despite the name, this library does not apply only to Visual Studio extensions. It is a standalone threading utilities component developed by the Visual Studio team to help with issues they've found as they build a fairly complicated multi-threaded app ;).

They also have great Roslyn analyzers in Microsoft.VisualStudio.Threading.Analyzers that help identify issues and propose solutions.

If you can't justify referencing the package, you can grab their source from: https://github.com/microsoft/vs-threading/blob/a25ce38cd972ab15fabcc055dc0aa7fb8bf0c344/src/Microsoft.VisualStudio.Threading/ThreadingTools.cs#L105 (license is MIT, but I'm not a lawyer so talk to one if you have concerns about using it: https://github.com/microsoft/vs-threading/blob/a25ce38cd972ab15fabcc055dc0aa7fb8bf0c344/LICENSE)

Zenexer commented 5 years ago

Thanks, that'll be quite handy.

Looking at the implementation, it's using TaskCompletionSource<bool>. It's a more convoluted way than the fix we've proposed here, but it seems as though the effect will be identical to the code here as long as we change await to await await. (Task.WhenAny<T> returns Task<Task<T>>, not Task<T>. This actually seems to obviate the need for the extra logic in the Microsoft.VisualStudio.Threading implementation.)

Can I go ahead and make a PR for this, or should I wait?

analogrelay commented 5 years ago

I think the solutions in your extension method are sufficient for our code, you can create a PR with that. I agree that the stuff in Microsoft.VisualStudio.Threading is a little more complicated than we need here (it does a lot of work to avoid allocating extra async state machines).

StephenCleary commented 4 years ago

It might be possible to mitigate this with a more descriptive name for the methods.

I use WaitAsync(CancellationToken), in an attempt to make it clear that it is the wait that is cancelled and not the task.

ghost commented 4 years ago

As part of the migration of components from dotnet/extensions to dotnet/runtime (https://github.com/aspnet/Announcements/issues/411) we will be bulk closing some of the older issues. If you are still interested in having this issue addressed, just comment and the issue will be automatically reactivated (even if you aren't the author). When you do that, I'll page the team to come take a look. If you've moved on or workaround the issue and no longer need this change, just ignore this and the issue will be closed in 7 days.

If you know that the issue affects a package that has moved to a different repo, please consider re-opening the issue in that repo. If you're unsure, that's OK, someone from the team can help!

ghost commented 4 years ago

Tagging subscribers to this area: @tarekgh Notify danmosemsft if you want to be subscribed.

moonrockfamily commented 4 years ago

It might be possible to mitigate this with a more descriptive name for the methods.

I use WaitAsync(CancellationToken), in an attempt to make it clear that it is the wait that is cancelled and not the task.

@StephenCleary Your API designs are very 'clear'. 👍 Looking forward to using https://github.com/StephenCleary/AsyncEx to solve my HostedService BackgroundService recoverability. Currently, critical exceptions are vaporized, leaving the Host running, but useless! I don't want to build sophisticated recovery scenarios in my 'worker', it already has a reasonable implementation of 'starting up', why build another; let the host reinstantiate it! Simple, leverages DI without work-around re-init methods etc.

StephenCleary commented 4 years ago

@moonrockfamily I blogged about that recently - part of a 3-part blog series. I use all three techniques to create a "critical BackgroundService" base type that avoids startup delay (Task.Run), catches ExecuteAsync exceptions, and logs and then exits the host if a critical background service fails.

maryamariyan commented 4 years ago

Can I go ahead and make a PR for this, or should I wait?

@Zenexer the source code has moved to dotnet/runtime since. Are you interested in creating a PR for this?

nminaya commented 3 years ago

Not sure if this issue is related but I just realized that the CancellationToken passed to BackgroundService.StartAsync is not the same as the one received in ExecuteAsync method. So when the CancellationToken is cancelled, ExecuteAsync is not getting notified through the CancellationToken and keeps running.

Nick-Stanton commented 2 years ago

@Zenexer thanks for the catch. A fix has been merged to resolve this behavior