dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

ThreadPool unfriendly to background processes #103812

Open AdamFrisby opened 3 months ago

AdamFrisby commented 3 months ago

Description

When running a load that is highly burstable - i.e. creates lots of threadpool workers that aren't used persistently, the threadpool itself uses a lot of CPU time waiting for jobs, that could interfere with background processes (outside the application).

The codebase extensively uses async/await for primarily low-level networking related operations.

Configuration

.NET Core LTS 6.0 Linux (Ubuntu 22.04) AWS EC2 t2.medium host

Data

Measured using a combination of htop and JetBrain's dotTrace - when 17 threadpool workers are created, the system CPU usage is around 75% +/- 15% (averaging both cores).

dotTrace (sampling mode) reports that 35.4% of the CPU usage is spent in the following stack trace:

System.Threading.LowLevelLifoSemaphore.WaitNative(SafeWaitHandle, Int32)
System.Threading.LowLevelLifoSemaphore.WaitForSignal(Int32)
System.Threading.LowLevelLifoSemaphore.Wait(Int32, Boolean)
System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
System.Threading.Thread.StartCallback()

Another 45% or so is spread among

10.5% System.Threading.Monitor.Wait(Object, Int32)

9.8% System.IO.FileSystemWatcher+RunningInstance.TryReadEvent(out NotifyEvent)

8.7%

System.Threading.WaitHandle.WaitOneNoCheck(Int32)
System.Threading.PortableThreadPool+GateThread.GateThreadStart()
System.Threading.Thread.StartCallback()

5.3% System.Threading.Thread.Sleep(Int32)

4.9% Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)

Which all appear to be roughly symptomatic of the same thing - even if different in specifics.

Analysis

Across a range of areas (see stack traces above) the runtime appears to be spinlocking while waiting for jobs, rather than using interrupts which may be better in this scenario when some latency in picking up the job is acceptable (i.e. I'm fine waiting 5ms for the job to start).

This would allow the CPU to idle more lowering both environmental impact (both real world, and within the OS instance), as well as better manage things such as AWS's CPU Credits; and provide more accurate reporting in system level analytics on the real system load.

dotnet-policy-service[bot] commented 3 months ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

EgorBo commented 3 months ago

You can control this behaviour via DOTNET_ThreadPool_UnfairSemaphoreSpinLimit, e.g. set it to just 0 to prefer sched_yield over spin locks. It's a known issue that our heuristics need some tunning as they're focused mainly on quick short-living tasks and lead to unnecessary cpu usage for other scenarios due to long spin waits.

AdamFrisby commented 3 months ago

Brilliant, thankyou @EgorBo - we'll give that a try.

Can that be changed via code, or environmental only?

EgorBo commented 3 months ago

Brilliant, thankyou @EgorBo - we'll give that a try.

Can that be changed via code, or environmental only?

Either environment variable DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 or runtime config json System.Threading.ThreadPool.UnfairSemaphoreSpinLimit, can't be set via code.

filipnavara commented 3 months ago

Note that calls like LowLevelLifoSemaphore.WaitNative, WaitHandle.WaitOneNoCheck and in most cases Monitor.Wait are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.

EgorBo commented 3 months ago

Note that calls like LowLevelLifoSemaphore.WaitNative, WaitHandle.WaitOneNoCheck and in most cases Monitor.Wait are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.

It's usually much worse on arm64 due to lack of pause alternative

filipnavara commented 3 months ago

It's usually much worse on arm64 due to lack of pause alternative

These are not spin wait though, at least predominantly so. They are OS event waits that should completely block the thread from being scheduled.

EgorBo commented 3 months ago

Oops, right, I didn't notice that it's WaitForSignal that is hot here, so it's not related to spin waits.

AdamFrisby commented 3 months ago

Note that calls like LowLevelLifoSemaphore.WaitNative, WaitHandle.WaitOneNoCheck and in most cases Monitor.Wait are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.

This is what I'd expect, but monitoring via top/htop does seem to show the process actually using CPU time too.

filipnavara commented 3 months ago

This is what I'd expect, but monitoring via top/htop does seem to show the process actually using CPU time too.

I'd expect most of these wait calls to be red herrings. Not sure if dotTrace is the best profiler for this but perhaps there's something else in the Hot Spots view?

The file system watcher may be relevant.

WaitForSocketEvents CPU usage is also not odd for server-side apps but there could be some issues around the network request processing that may stress it more.

EgorBo commented 3 months ago

WaitForSocketEvents is effectively an infinite loop over epoll_wait where epoll_wait syscall may also either use spin-wait or sched_yeild internally.

AdamFrisby commented 3 months ago

I'd expect most of these wait calls to be red herrings. Not sure if dotTrace is the best profiler for this but perhaps there's something else in the Hot Spots view?

I have considered this - I tried running it in tracing mode as well, but unfortunately it's spitting out broken trace files that cannot be opened. Is there another you'd recommend trying with?

AdamFrisby commented 3 months ago

Just as some quick unscientific results - DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0 did help.

The overall CPU usage went down by about 25% (from ~75% into the 55-60% range) - not enough to completely fix the issue, but it certainly helped. I'm going to spend a bit more time with various profilers to see if I can improve things from the code side.

Thankyou both for your detailed comments!