Tracking issue for some thread pool experiments (.NET 8)

kouvel commented 1 year ago

This issue tracks some thread pool experiments/investigations that were proposed.

[ ] Polling for IO on worker threads
- There can be some tradeoffs between using dedicated IO poller threads and polling for IO on worker threads. Some issues observed in some cases with dedicated IO pollers is the thread scheduling latency, difficulty of determining a number of IO pollers on machines of various sizes, and the extra thread hop in processing IO events.
- [ ] Build a prototype for experimenting with and collect some performance data to understand the pros and cons. In progress.
- [x] Use one global epoll fd and determine an alternate mechanism to assiciate an IO event with a callback
- [ ] Experiment with some strategies for polling for IO to balance the overhead, such as frequency of polling, batch sizes, etc.
[ ] Using the Windows thread pool
- There can be some tradeoffs in using the Windows thread pool, it may be beneficial in cases where other components also use the Windows thread pool. The goal is to experiment with using it and collect some data to understand some of the tradeoffs.
- [ ] Experiment with using the Windows thread pool in coreclr and measure perf. In progress.
- [ ] Investigate regressions and determine if they can be reasonably fixed
[x] Processing IO events at higher priority on Unixes
- IO events are processed in the same order as global work items, so in cases where the global queues are heavily backed up, in-progress requests could be delayed by processing of new requests. The goal is mainly to try it out and to understand any perf regressions.
- [x] Build System.Net.Sockets against CoreLib, queue processing of IO events at high priority, and gather some perf data
- [x] Investigate regressions and determine if they can be reasonably fixed. Most of the ASP.NET benchmarks resulted in regressions. After some investigation and experimentation the regressions appeared to reduce in magnitude on some tests, but still there. Needs further investigation to determine if this can reasonably be done on Unixes without perf regressions.
[x] Disabling hill climbing
- This is a quick experiment to just measure the current perf effects of disabling hill climbing. Hill climbing adds some costs and it was seen that it doesn't help in several kinds of apps.
- [x] Gather some current perf data to understand the effects of disabling hill climbing. Perf in ASP.NET benchmarks appears to be mostly similar or slightly better. Perf metrics in a large internal service did not change, but with fewer worker threads on average.

Some leftover work items are tracked in https://github.com/dotnet/runtime/issues/52701.

ghost commented 1 year ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

Issue Details

This issue tracks some thread pool experiments/investigations that were proposed. - [ ] Polling for IO on worker threads - There can be some tradeoffs between using dedicated IO poller threads and polling for IO on worker threads. Some issues observed in some cases with dedicated IO pollers is the thread scheduling latency, difficulty of determining a number of IO pollers on machines of various sizes, and the extra thread hop in processing IO events. - [ ] Build a prototype for experimenting with and collect some performance data to understand the pros and cons. In progress. - [x] Use one global epoll fd and determine an alternate mechanism to assiciate an IO event with a callback - [ ] Experiment with some strategies for polling for IO to balance the overhead, such as frequency of polling, batch sizes, etc. - [ ] Using the Windows thread pool - There can be some tradeoffs in using the Windows thread pool, it may be beneficial in cases where other components also use the Windows thread pool. The goal is to experiment with using it and collect some data to understand some of the tradeoffs. - [ ] Experiment with using the Windows thread pool in coreclr and measure perf. In progress. - [ ] Investigate regressions and determine if they can be reasonably fixed - [x] Processing IO events at higher priority on Unixes - IO events are processed in the same order as global work items, so in cases where the global queues are heavily backed up, in-progress requests could be delayed by processing of new requests. The goal is mainly to try it out and to understand any perf regressions. - [x] Build System.Net.Sockets against CoreLib, queue processing of IO events at high priority, and gather some perf data - [x] Investigate regressions and determine if they can be reasonably fixed. Most of the ASP.NET benchmarks resulted in regressions. After some investigation and experimentation the regressions appeared to reduce in magnitude on some tests, but still there. Needs further investigation to determine if this can reasonably be done on Unixes without perf regressions. - [x] Disabling hill climbing - This is a quick experiment to just measure the current perf effects of disabling hill climbing. Hill climbing adds some costs and it was seen that it doesn't help in several kinds of apps. - [x] Gather some current perf data to understand the effects of disabling hill climbing. Perf in ASP.NET benchmarks appears to be mostly similar or slightly better. Perf metrics in a large internal service did not change, but with fewer worker threads on average. Some leftover work items are tracked in https://github.com/dotnet/runtime/issues/52701.

Author:	kouvel
Assignees:	kouvel, eduardo-vp
Labels:	`area-System.Threading`
Milestone:	8.0.0

davidfowl commented 1 year ago

Using the Windows thread pool

I'm for this experiment but I would say that no matter what we find, we shouldn't default to this.

davidfowl commented 1 year ago

How come some of these are checked? Are those experiments that were already run?

kouvel commented 1 year ago

How come some of these are checked? Are those experiments that were already run?

Yea some of these are complete with some leftover work items tracked in https://github.com/dotnet/runtime/issues/52701 for now.

jkotas commented 1 year ago

I'm for this experiment but I would say that no matter what we find, we shouldn't default to this.

We should wait for what we find before making calls like this one. For example, what if we find that Windows thread pool is superior in all dimensions?

davidfowl commented 1 year ago

We should wait for what we find before making calls like this one. For example, what if we find that Windows thread pool is superior in all dimensions?

To me, it feels like it that decision would fly in the face of attempting to build a consistent platform. Even though we can't do this 100% as platform differences always come through in some APIs, our goal should be to make the platform behavior as consistent as possible across platforms where possible.

Even if it was better on all dimensions, we should default to our managed components over OS ones. We have more control and can provide a more consistent experience. That isn't just about behavior, but also about configuration.

This is why ASP.NET Core has HTTP.sys and Kestrel server implementations but defaults to the managed one (amongst other reasons). We don't need to educate people about configuring register keys to tweak server behavior because we made a decision to use a windows component that relies on those behaviors.

When I think about the changes we made to the threadpool to work around blocking APIs, I think about the lack of control we would have if we delegated elsewhere. We'd need to wait for a new windows version to get this behavior. Seems like a non-starter IMHO.

jkotas commented 1 year ago

Yes, it is a tradeoff. We often take advantage of platform-specific capabilities in implementation runtime and libraries implementation to maximize the .NET platform value. For example, async I/O works very differently on Windows vs. Linux, and these differences come through in some APIs. It would not make sense to limit async I/O implementation choices to least common denominator in the name of consistent platform.

jkotas commented 1 year ago

When I think about the changes we made to the threadpool to work around blocking APIs,

It is not just that. Windows threadpool is used as an implementation detail for number of Windows subsystems. It means that there are two threadpools running in typical .NET apps. I would expect that switching to Windows threadpool reduces our memory footprint by eliminating the threadpool duplication, especially for smaller apps.

davidfowl commented 1 year ago

For example, async I/O works very differently on Windows vs. Linux, and these differences come through in some APIs. It would not make sense to limit async I/O implementation choices to least common denominator in the name of consistent platform.

Right, I mentioned this, but we do a good job unifying how they work which got even better when we moved the windows IO polling code to managed code in .NET 7. That reduced the differences between the OSes implementations. This is similar to what libuv does and really any modern platform that does IO. Now if you want to eek the last drop of performance out of the platform then you can opt-into that specific platform implementation and maximize the performance.

I'm just talking about defaults; I think we should bias heavily towards managed by default giving users the option to opt-into OS specific implementations as a principle.

NinoFloris commented 1 year ago

@kouvel what would " Polling for IO on worker threads " look like from where we are today?

dotnet / runtime

Tracking issue for some thread pool experiments (.NET 8) #77665