dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.02k stars 4.67k forks source link

Resetable CancellationTokenSource registrations #4694

Closed benaadams closed 4 years ago

benaadams commented 8 years ago

CancellationTokenSource is quite a heavyweight object and its not normally cancelled; however it can't be pooled or reused because its registrations cannot be cleared.

Its composed of the following objects:

ManualResetEvent m_kernelEvent
SparselyPopulatedArray<CancellationCallbackInfo>[] m_registeredCallbacksLists
int m_state
int m_threadIDExecutingCallbacks
bool m_disposed
CancellationTokenRegistration [] m_linkingRegistrations
CancellationCallbackInfo m_executingCallback
Timer m_timer

A ClearRegistrations() or Reset() function would be very helpful.

davidfowl commented 8 years ago

/cc @stephentoub

stephentoub commented 8 years ago

A ClearRegistrations() or Reset() function would be very helpful.

One of the key aspects of CTS / CancellationToken is that it's one-directional, only ever transitioning from non-canceled to canceled, never the reverse. Lots of code depends on this, otherwise it's extremely prone to race conditions. It also seems particularly dangerous to "clear registrations"... if someone's still registered, why is it safe to ignore those registrations? If you know it's safe, then you're also typically in a position to unregister because you were the one to register, at which point you can choose to pool/reuse the CTS if it's not yet been canceled.

If you have suggestions for making the implementation more lightweight under the existing surface area and wanted to submit PRs for that, we'd be very happy to review. I don't believe we should add the proposed surface area, however.

benaadams commented 8 years ago

The use case wouldn't need to undo the cancel; but remove the registrations, i.e. it is still a one way cancel.

The token is passed on to external code so the owner of the CancellationSource doesn't have access to the CancellationTokenRegistration objects.

Use case:

Connection has a CancellationSource; handles many Requests which invoke user request handling code; when the connection is disconnected or the request is aborted the CancellationSource is triggered to cancel anything registered during the request.

use

However, if the connection hasn't been terminated; at the end of the request all the request and user code registrations need to be dumped as they are now out of scope and the CancellationSource prepared for the next request. The current way to do this is to dispose the CancellationSource and create another one.

In Kestrel's Frame.cs the Reset function disposes the CancellationToken and it is then recreated on the next loop ready for registrations.

The issue with this is that the CancellationSource is one of the highest allocations that happens in the loop.

So am looking for a lower GC pressure way of being able use a CancellationToken for a longer scoped object (Connection); but with a resetting scope for the registrations (Requests).

stephentoub commented 8 years ago

The issue with this is that the CancellationSource is one of the highest allocations that happens in the loop.

How much of that is the CTS itself, and how much are the allocations associated with the registrations?

at the end of the request all the request and user code registrations need to be dumped as they are now out of scope and the CancellationSource prepared for the next request.

At that point, why is there anything still registered with the CTS? Doesn't that imply someone isn't disposing of their CancellationTokenRegistration?

benaadams commented 8 years ago

How much of that is the CTS itself, and how much are the allocations associated with the registrations?

In the plaintext techEmpower performance benchmark; its all the CTS as there are no registrations.

At that point, why is there anything still registered with the CTS?

Its library code; can't enforce the user of the library not to make mistakes; and try to gracefully handle them rather than continuously build up references for over every request.

However, would you recommendation be to keep using the same CTS and assume that the registrations are correctly disposed?

stephentoub commented 8 years ago

would you recommendation be to keep using the same CTS and assume that the registrations are correctly disposed?

No, I see what you're saying.

At the same time, I'm struggling with the notion of simply dropping registrations. You've given out a CancellationToken, and code has registered with that CancellationToken. At some point the thing that provided the CancellationToken says "that's it, I'm done"... if anything is still listening, e.g. someone kicked off a cancelable timer or had some pending database request that can now be canceled and so on, shouldn't the CTS be Cancel'd rather than dropped? And once it's canceled, it's transitioned from non-canceled to canceled, and it's no longer reusable.

benaadams commented 8 years ago

Because its not cancelled, it completed. So the CancellationToken is for an out of band situation where it needs to be cancelled mid processing.

I think the standard approach would be for the connection to have its own CancellationTokenSource which is long lived and then use CreateLinkedTokenSource for each request with a new CancellationTokenSource which it can then dispose; but that's just frightening in number of allocations (e.g. when doing 1M+ requests per second)

stephentoub commented 8 years ago

Because its not cancelled, it completed. So the CancellationToken is for an out of band situation where it needs to be cancelled mid processing.

What's the mechanism by which work I've kicked off gets canceled when the request completes then? Or the assumption is "just don't do that"? This is obviously up to will of whoever is handling out the CT, but to me, it makes perfect sense to say "I'm completing now, my job is done, if anyone out there related to my work is still going, you can cancel now because I'm outta here." So I'm not quite on board with the argument of "because it's not canceled, it completed".

With regards to clearing registrations, let's hypothetically say we decided that was a good idea and it was something we wanted to do. CancellationTokens are a thin wrapper for a CancellationTokenSource, with their only field being a reference back to the CTS. How do you propose that be handled? Someone who was given a CT and polls it will still be referencing the original CTS, which might now be used to represent a completely unrelated piece of work. We could add an Int64 field to the CTS that represents a version number, add an Int64 field to a CT that represents a version number, and say the connection between the two is only valid if those two numbers match, but now both the struct and the class are 8 bytes bigger, and every operation on the CT needs to do additional work to validate that its source is still valid. Did you have something else in mind?

benaadams commented 8 years ago

What's the mechanism by which work I've kicked off gets canceled when the request completes then?

Its a serial pipeline so the request only completes when the work is done; however if the network connection is dropped, so there is nothing to receive the result produced at the end of the pipeline; then the pipeline needs to be stopped (if someone chooses to register for cancellation; or use the cancellation token in someway - which they may not do).

let's hypothetically say ...

Sounds quite good :+1: An extra 8 bytes for the lifetime of the connection is nothing compared to the 72 bytes per request, which for 4k requests is 288000 bytes (below)

cts-allocations

For 2M requests a second its 144 MB per second or 8.64 GB a minute that needs to be GC'd just for the CancellationTokenSource. The push for cutting memory is that GC is currently the main activity of the server (shown below); and that's with a lot of pooling already going on.

gc-percentage

Could have different flavours of CancellationTokenSource, ones that don't have the timer for example; or even a CancellationTokenSourceSlim that doesn't have much but has protected or public properties so it could be implemented in different ways via derived classes.

Though it would need a function that creates the CancellationToken's since its constructor is currently internal and only takes a CancellationTokenSource?

stephentoub commented 8 years ago

As noted earlier, if the concern is about the size of CTS, I'd be happy to look at proposals for making the implementation better. For example, if we really believe that the most common case in general (and not just on one specific benchmark) is no registrations, then we could consider putting all of the CTS state into a lazy-allocating subobject. Or maybe there are ways of eliminating some fields or consolidating them, etc.

stephentoub commented 8 years ago

An extra 8 bytes for the lifetime of the connection is nothing compared to the 72 bytes per request, which for 4k requests is 288000 bytes (below)

You're focused on one very specific benchmark. There are many other use cases. And increasing the size of this object by 10% should not be taken lightly when the concern to begin with is its size. I'm also not entirely sure my off-the-cuff comment would even work, e.g.with the synchronization necessary to prevent such a token from registering with its old CTS.

Its a serial pipeline so the request only completes when the work is done

How is the serial nature enforced? You're saying it's not possible in ASP.NET to schedule concurrent operations? Or you're talking about this specific benchmark?

benaadams commented 8 years ago

The CTS isn't particularly big in the scheme of things; its the multiplication of its size that the main problem; because currently the only safe approach it to ditch it each time. Even a single new object (12 bytes?) done on every request is 24 MB/s of garbage.

You're saying is not possible in ASP.NET to schedule concurrent operations?

Obviously its dotnet so if you wanted to spin up a background thread or split of multiple tasks there's nothing stopping you; so you can go quite happily go "off piste"; have background activities pruning a cache, posting stats etc

For the request pipeline specifically, its a pluggable where each component is called, then it calls the next item in the pipeline; so it has access to the before and after state. I believe the expectation is that outside that context the component shouldn't be trying to mess with the request or holding onto references to it.

You could kick something off on the way in and then finish it on the way out; or have multiple components at different parts of the pipeline communicate with each other; by altering the request; for example authorisation and authentication do this. But essentially when the request has been sent out across the network that's when its complete and nothing further can be done with it.

stephentoub commented 8 years ago

The CTS isn't particularly big in the scheme of things; its the multiplication of its size that the main problem;

How many objects / how much memory is currently allocated per request? What percentage is the CTS? I of course completely understand the goal of minimizing allocations, but the numbers you're citing are difficult to reason about without context.

And do you have a proposal for how to actually make it all work, assuming we did decide to expose such a reuse operation? It would need to be bulletproof with regards to the previously handed out token not being able to poll, register with, etc. the reused CTS, and it would need to not add to the size of the CTS in a meaningful way for scenarios that do create lots of instances in cases where the CTS object is one of the larger contributors to allocations, such as creating a cancelable task.

benaadams commented 8 years ago

How many objects / how much memory is currently allocated per request? What percentage is the CTS?

The in "aspnet/dev" allocations as a ratio look like this (well higher as its currently using _requestAbortCts = CancellationTokenSource.CreateLinkedTokenSource(_disconnectCts.Token);)

allocations

There are PRs hopefully to take the last two lines to more or less zero; at which point it becomes 36% of this part of pipeline. The other allocator is the input header's value strings (the names/keys don't allocate) - but I believe that's also being looked at.

The streams I'm not sure about; they mostly exist to support the stopped/aborted state for the current request so you can't continue writing to it after; and for replacement, filters, default, encyption etc Though they could probably be combined.

benaadams commented 8 years ago

And do you have a proposal for how to actually make it all work

Not entirely; though a derived CTS might be safer; or multiple derived types, timed, single use, etc and I like your versioning idea; but that would probably change contracts; and I take your point.

where the CTS object is one of the larger contributors to allocations, such as creating a cancelable task.

Needs more thought; would be good if that could also be reduced; obviously increasing the load here would be very bad; as once you've tried async you can't go back... but there is a lot going on in CTS; and though I use them a lot; other than plumming them in I don't think I've ever knowingly cancelled one.

stephentoub commented 8 years ago

Thanks for pushing on this, @benaadams. It's impressive to see how much you and others have been able to drive down those allocations, and it's good to understand the costs here in perspective.

I will take a look at CTS today to see if any easy wins jump out at me.

benaadams commented 8 years ago

I've been wondering on this and the Registrations should be disposed; so if the common use case was to pass them though to other library code; for example a sql exec task; then that library should be disposing of the registrations; and hopefully if the user was actually registering things - they should know enough to dispose them - at which point the same CTS could be used for the whole connection?

Though I don't know how the aspnet team would feel about it as it is potential bleed though and a memory leak. On the other hand it is a case of not disposing of your disposables....

/cc @davidfowl @halter73 @Tratcher

Tratcher commented 8 years ago

Since this will be exposed to arbitrary 3rd party components we can't require them to always un-register, it would be un-enforceable and lead to memory leaks. Really it's a per-request token that you are trying to re-use for the next request on the same connection because you know it didn't fire yet and everyone should be done with it.

stephentoub commented 8 years ago

@benaadams, I just reread what you wrote about sizes, and noticed that you said "of this part of pipeline". I don't know enough about the system to understand what this part of the pipeline is and whether that's actually used on its own, or whether it's never used on its own and this is just a piece of a larger pie that's always used. Can you clarify? How does this impact the entire request/response for example, assuming that's the pipeline being discussed?

benaadams commented 8 years ago

@stephentoub its most of all; was just expressing it wasn't everything so there are extra allocations in the request input for example with header strings which are the highest allocators. I did an experiment in this PR https://github.com/aspnet/KestrelHttpServer/pull/411 which also gives the breakdown of the top 5 types in current aspnet/dev

benaadams commented 8 years ago

Obviously when it comes to user code all bets are off as that will undoubtedly be the highest allocator; but the load the library presents is the maximal possible optimization level they can get to - so its about it having the lightest load possible.

benaadams commented 8 years ago

@stephentoub looking at the whole app/pipeline after a few of the non-conflicting PRs have been merged across 4k requests

cts-allocs

Items 1,2,3 have stuff in the works, taking this CTS to the second most allocated type; from its current 5th place.

benaadams commented 8 years ago

And the previous second of code for allocations now looks like for 4k requests

request-bytes-allocs

stephentoub commented 8 years ago

How about a different approach, one that lazily-allocates the CTS only when it's actually needed: https://github.com/stephentoub/KestrelHttpServer/commit/c7d7f0e57514b990e1dc04dd794912c2c79965dd

I don't know what perf test you're running, so I've not evaluated the perf impact of this and whether it provides the desired help or not. If it does, I can submit this as a PR once your changes go in, as I've branched off of your PR at https://github.com/aspnet/KestrelHttpServer/pull/408.

benaadams commented 8 years ago

Heh, that's a really good idea! Can't see the wood for the trees...

Merged it into the current PR, thank you!

@stephentoub Should this issue be closed?

stephentoub commented 8 years ago

Glad it was helpful. Yes, let's close this. We can still explore ways to reduce the size of a CTS further (it's now 64 bytes on 64-bit), but I don't see a good way to safely enable the kind of reuse you're describing, and with my commit, it sounds like the targeted use case no longer exists, or at least is significantly diminished.

benaadams commented 8 years ago

Re-run the tests and this completely clears up the allocations; thank you again.

benaadams commented 8 years ago

And I'm getting good sustained throughput

thoughput

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.50ms   20.49ms 405.61ms   93.01%
    Req/Sec    44.57k     8.62k  101.79k    93.77%
  170312239 requests in 2.00m, 20.94GB read
Requests/sec: 1418228.35
Transfer/sec:    178.53MB

1.4M requests per second

roji commented 4 years ago

Does it make sense to revisit this (along the lines of ManualResetValueTaskSourceCore)? My specific use case is waiting on a Channel with a timeout, which only seems to be possible via a cancellation token, and I'd like to avoid incurring an allocation each time. The timeout must be applied on every read so I can't see laziness being a solution here (or have I miss some other option?)

stephentoub commented 4 years ago

I'd like to avoid incurring an allocation each time

Can you not just reset the timeout and reuse the same CTS as long as it hasn't timed out / been canceled? An extra allocation when things actually time out should be much less impactful.

roji commented 4 years ago

@stephentoub yeah, that's what I'm doing - but the timeout here is possibly a frequent event, meaning the allocation would still be frequent. Am also wondering more in general, now that we allow reset for ValueTask, if there's a fundamental reason not to allow the same for cancellation tokens (via a different advanced type, with the understanding that it's the user's responsibility to make sure everything is properly managed).

stephentoub commented 4 years ago

if there's a fundamental reason not to allow the same for cancellation tokens (via a different advanced type, with the understanding that it's the user's responsibility to make sure everything is properly managed

You mean introducing an entirely new type that's not CancellationToken?

roji commented 4 years ago

I was imagining a new type that's not CancellationTokenSource, but that would be able to manage CancellationToken - but I may have a bad mental model of everything here. Otherwise an advanced, appropriately-named method on CancellationTokenSource?

stephentoub commented 4 years ago

CancellationToken has very clear semantics: once IsCancellationRequested is true, it'll never be false. And implementations rely on that. That was not the case with ValueTask<T>. We introduced IValueTaskSource<T> so soon after the original type that we felt it was acceptable to retcon the behavioral changes... and even with that short gap we've seen it result in fallout we're still dealing with. CancellationToken has been around for way, way longer.

roji commented 4 years ago

Thanks for explaining. A new type that's not CancellationToken indeed seems like a non-starter.

davidfowl commented 4 years ago

😢 This makes me sad.