dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.85k stars 4.62k forks source link

Reduce thread pool pressure by batching workitems if multiple timers need to fire #51362

Open davidfowl opened 3 years ago

davidfowl commented 3 years ago

I was looking at what happens when I spin up 100,000 concurrent websocket connections on the server side all doing ping pongs. They each have their own periodic timer that fires on an interval (default 2 mins in ASP.NET Core). This will spawn 100,000 thread pool work items to fire timer callbacks (well 99,999) and I think it would be interesting to experiment with batching these callbacks the same way we do with socket continuations. The idea would be to keep a separate IThreadPoolWorkItem and a ConcurrentQueue<TimerQueueTimer> that mimics this https://github.com/dotnet/runtime/blob/31c28fc379ec6f581c101f630a3a81e0f215c1ea/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs#L198-L262. In theory, this would reduce the threadpool overhead for firing lots of timers that typically fire together.

This is something I'm gonna try out but I'd like to get some of your thoughts @stephentoub and @kouvel (anyone else that wants to join in !)

kouvel commented 3 years ago

How about making the timer work items local? Might be more efficient, and that would also allow a timer tracking an in-flight work item to fire and the callback to be processed before having to deplete other global work items first.

davidfowl commented 3 years ago

and that would also allow a timer tracking an in-flight work item to fire and the callback to be processed before having to deplete other global work items first.

My only concern with using local queues is the work stealing scaling problem we currently have, but this does sound interesting. Is ordering a concern at all (LIFO vs FIFO)?

kouvel commented 3 years ago

I think the work stealing scaling problem is more prominent when the queues are frequently depleted (undesirable scaling for each thread to see that there is no work), which exists despite local queues not being used. For that issue anyway I was experimenting with a fix but hadn't completed it yet. I don't think using the local queues would make the scaling worse.

Regarding ordering, I don't think the order matters between each of the timer callbacks in a batch that have elapsed their due time, new timers are already inserted/processed in LIFO order, and I don't think we make any guarantee about the order in which they run. It's possible that while processing some timer callbacks that timers with a later due time tick, then due to the LIFO order of the local work items that new batch of callbacks would be processed before the callbacks queued earlier from an earlier due time. I think it's a very slim chance since there's a decent amount of time (at least 1 ms) between batches, and it's not dissimilar from delaying the first batch until the later due time such as if the callback to schedule callbacks is delayed, so even then the order shouldn't matter.

davidfowl commented 2 years ago

@kouvel Should we try this in .NET 7?

kouvel commented 2 years ago

Marked it for 7.0 for now, will take a look

kouvel commented 2 years ago

Actually you might be right about work stealing being a problem. If only one or few threads queue a whole bunch of timers locally and the thread pool is mostly idle, then all other threads would have to rely on work stealing to get that work. Batching may be more efficient when the thread pool is idle, so maybe that's a better solution for this case. But we also have proc count TimerQueues so it may work out. Will try it out.