dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

CancellationTokenSource is hard to use correctly #29970

Closed NinoFloris closed 3 years ago

NinoFloris commented 5 years ago

CancellationTokenSource is hard to use correctly, especially in concurrent scenarios.

Most CancellationTokenSource methods are booby-trapped with ThrowIfDisposed, especially tricky with .Token and .Cancel(). As for both cases there's something to say for silently returning a None token or not Cancelling, but also something to say for being explicit about it and throwing if it happens.

The problem with that last option though is that there's no way to actually check if the CancellationTokenSource is already disposed, now checking could obviously race but there are no Try## variants of these methods either. This leads to over cautious behavior, wrapping each call in a try block.

Concurrent success and cancellation/failure cases are also fun as Cancel throws if the CTS was already disposed. But for timer based CTS's you really do want to Dispose once some piece of code sucessfully completed. This is prone to races between timeouts that try to cancel and success that tries to dispose. This is 'fixable' by locking on the CTS but it all feels brittle.

The current Cancel api is in my opinion the most egrigious, TryCancel would be a valuable api to add.

Lastly there's the matter of having to Dispose a CTS — but really only in some cases. When the CTS is timer based, to keep the timer queue clean. Failing to do so isn't easy to diagnose and I believe it's a pervasive problem in any corporate codebase not familiar with all the details. Having some diagnostic help like UnobservedTaskException but for unobserved timer based CTS's would be amazing.

Right now CancellationTokenSource feels far from the pit of success while it really doesn't have to.

scalablecory commented 5 years ago

Would you say there's an underlying difficulty with tracking the lifetime of a CancellationTokenSource?

Can you give some sample code where this would be useful?

Clockwork-Muse commented 5 years ago

...it sounds like you're disposing of your CTS too early (and possibly from multiple places).

Concurrent success and cancellation/failure cases are also fun as Cancel throws if the CTS was already disposed. But for timer based CTS's you really do want to Dispose once some piece of code sucessfully completed. This is prone to races between timeouts that try to cancel and success that tries to dispose. This is 'fixable' by locking on the CTS but it all feels brittle.

You don't need a lock, but you do need a way to communicate that you're done with your CTS. Often, this can be managed just by verifying that all other tasks with a direct reference to it or its token have completed (with or without error). Things get trickier if something you call creates merged CTSs, but generally they're not going to hide the lifetime of what they're doing.

Lastly there's the matter of having to Dispose a CTS — but really only in some cases. When the CTS is timer based, to keep the timer queue clean. Failing to do so isn't easy to diagnose and I believe it's a pervasive problem in any corporate codebase not familiar with all the details. Having some diagnostic help like UnobservedTaskException but for unobserved timer based CTS's would be amazing.

Generally speaking, if something implements IDisposable, it needs to be disposed. I would not be at all surprised if there was an analyzer that would check for such a thing. However, sometimes you might have to delay disposing until some other point in your code.

NinoFloris commented 5 years ago

@scalablecory Exactly :)

@Clockwork-Muse yes during our app startup this might happen because of many different reasons and can come from any request thread while the success path is still running. Aspnet Core has similar troubles with their CTS lifetime managment for RequestAborted and has had multiple patches for it. TryEntering the object would merely be an atomic signal for any other code running that it's disposed and not to be called anymore, essentially tainting the object without introducing a wrapper type around CTS. There are many other and definitely better ways, but this was the most straight forward fix.

And yes we have merged CTS's, a timer CTS running, registered callbacks from other tokens, and the token is stored together with the Task it acts on in a shared cache so it could be cancelled + disposed by any of the threads waiting for that work or actively trying to purge the cache (which also calls dispose), all of that could race with the success path disposing the CTS after the work is done.

To be somewhat blunt I'm not looking for development advice. I know how to deal with the flaws, It'd just be great if there were less which is what this issue hopes to address, to start a discussion and have that become reality.

sandersaares commented 5 years ago

Can you outline some specific code patterns that cause the issue? I encountered the quirks of CTS behavior under concurrent load and the techniques I applied to overcome it did not require any try-catching nor would they benefit from "default to CT.None after Dispose" style behavior. Therefore, I wonder what is different in our approaches to solve it. As @Clockwork-Muse suggested, it might be that the patterns you apply to fix it could be improved.

NinoFloris commented 5 years ago

CTS.Token is fine if you read it directly and store it before handing out the CTS to any other threads that could dispose the CTS. CTS.Cancel however cannot make use of that same technique and is absolutely terrible for the reasons I outlined.

Again I'm not sure why my skills are immediately and repeatedly questioned, yet you can all find yourselves in the issues outlined. That our solutions differ shouldn't change the desire to try and fix the issues.

sharwell commented 5 years ago

@NinoFloris keep in mind that an undisposed CancellationTokenSource will only leak resources if it was created with CreateLinkedTokenSource (this is not a guarantee of the API, but it holds in the current implementations). If you are having trouble with a particular CTS and it was not created in this manner, you can mitigate the problems in the short term by not disposing of it.

At a broader level, dotnet/roslyn has addressed questions of multiply-owned objects in concurrent code using ReferenceCountedDisposable<T>. This type allows multiple references to a disposable object across concurrent and/or asynchronous operations with unknown lifetimes, with a guarantee of proper single disposal when the last reference is disposed.

NinoFloris commented 5 years ago

@sharwell thanks for that link! afaik and that's how David Fowler is preaching it if CTS's are timer based it should also be disposed to remove pressure on the timer queue.

And looking at the CTS and referenced code that's correct, TimerHolder is the only one that has a finalizer that will save you if you don't dispose iff that CTS isn't rooted anymore.

scalablecory commented 5 years ago

To be somewhat blunt I'm not looking for development advice. I know how to deal with the flaws, It'd just be great if there were less which is what this issue hopes to address, to start a discussion and have that become reality.

None the less, thank you for humoring the discussion. New APIs take a fair effort to bring in, and part of that is understanding what a new API solves.

@sharwell's link seems like a decent solution in the mean time.

wazzamatazz commented 4 years ago

Is there a particular reason that CancellationTokenSource does not cancel its token when it is disposed? I've been caught out by this a couple of times when I have a class that kicks off one or more background tasks that should gracefully exit when the class is disposed. The class has a CancellationTokenSource that is used to provide a CancellationToken to the background tasks. When it comes to implementing Dispose(), it feels strange that I need to explicitly cancel the CancellationTokenSource instead of this being done automatically when it is disposed.

stephentoub commented 4 years ago

Primarily because Dispose is only supposed to be used with objects no longer in use, and if the implementation needs to cancel, it's because it's still in use. So the cancellation is left to the caller, who cancels and can then dispose something no longer in use.

wazzamatazz commented 4 years ago

I understand where you're coming from, but by that logic, you could argue that a DbConnection should stay open after disposing it unless you explicitly call Close on it too. Either way, I think it would be helpful if the documentation for CancellationTokenSource clearly stated that disposing the object does not cancel the token.

stephentoub commented 4 years ago

but by that logic, you could argue that a DbConnection should stay open after disposing it unless you explicitly call Close on it too

In the DbConnection case, you wouldn't call Dispose on it if some other code was still using it; that would be a problematic race condition. Similarly, the ideal is you shouldn't be calling Dispose on a CancellationTokenSource while someone else is still using it, which is what's happening if code elsewhere is registered with its CancellationToken, polling its CancellationToken, etc.

Either way, I think it would be helpful if the documentation for CancellationTokenSource clearly stated that disposing the object does not cancel the token.

Sure. The docs are open source. Please consider submitting a PR to https://github.com/dotnet/dotnet-api-docs. Thanks!

wazzamatazz commented 4 years ago

Will do, thanks!

NN--- commented 3 years ago

@stephentoub I understand that disposing CancellationTokenSource is not recommended unless created using CreateLinkedTokenSource, correct ?

stephentoub commented 3 years ago

I understand that disposing CancellationTokenSource is not recommended unless created using CreateLinkedTokenSource, correct ?

No. It's good to dispose of disposable things in general. But sometimes, especially when dealing with asynchronous code, or when dealing with code where ownership or lifetimes isn't clear, that such a general guideline needs to be weighed for its benefits / risks. With CTS, it's most important to Dispose of it when created with CreateLinkedTokenSource, and secondarily when it's used with a timeout.

NN--- commented 3 years ago

So the issue is just asking how to improve usage of potentially disposed CancellationToken. Correct ?

stephentoub commented 3 years ago

So the issue is just asking how to improve usage of potentially disposed CancellationToken. Correct ?

That is my understanding of the request.

NinoFloris commented 3 years ago

Closing since TryReset made it into 6.0