Open AdamFrisby opened 3 months ago
Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.
You can control this behaviour via DOTNET_ThreadPool_UnfairSemaphoreSpinLimit
, e.g. set it to just 0
to prefer sched_yield
over spin locks. It's a known issue that our heuristics need some tunning as they're focused mainly on quick short-living tasks and lead to unnecessary cpu usage for other scenarios due to long spin waits.
Brilliant, thankyou @EgorBo - we'll give that a try.
Can that be changed via code, or environmental only?
Brilliant, thankyou @EgorBo - we'll give that a try.
Can that be changed via code, or environmental only?
Either environment variable DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0
or runtime config json System.Threading.ThreadPool.UnfairSemaphoreSpinLimit
, can't be set via code.
Note that calls like LowLevelLifoSemaphore.WaitNative
, WaitHandle.WaitOneNoCheck
and in most cases Monitor.Wait
are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.
Note that calls like
LowLevelLifoSemaphore.WaitNative
,WaitHandle.WaitOneNoCheck
and in most casesMonitor.Wait
are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.
It's usually much worse on arm64 due to lack of pause
alternative
It's usually much worse on arm64 due to lack of
pause
alternative
These are not spin wait though, at least predominantly so. They are OS event waits that should completely block the thread from being scheduled.
Oops, right, I didn't notice that it's WaitForSignal
that is hot here, so it's not related to spin waits.
Note that calls like
LowLevelLifoSemaphore.WaitNative
,WaitHandle.WaitOneNoCheck
and in most casesMonitor.Wait
are not really consuming CPU time. If you run a sampling profiler, which is common configuration of dotTrace, you will inevitably see time spent in those methods but it doesn't mean that it's consuming CPU resources.
This is what I'd expect, but monitoring via top
/htop
does seem to show the process actually using CPU time too.
This is what I'd expect, but monitoring via
top
/htop
does seem to show the process actually using CPU time too.
I'd expect most of these wait calls to be red herrings. Not sure if dotTrace is the best profiler for this but perhaps there's something else in the Hot Spots view?
The file system watcher may be relevant.
WaitForSocketEvents
CPU usage is also not odd for server-side apps but there could be some issues around the network request processing that may stress it more.
WaitForSocketEvents
is effectively an infinite loop over epoll_wait
where epoll_wait
syscall may also either use spin-wait or sched_yeild internally.
I'd expect most of these wait calls to be red herrings. Not sure if dotTrace is the best profiler for this but perhaps there's something else in the Hot Spots view?
I have considered this - I tried running it in tracing mode as well, but unfortunately it's spitting out broken trace files that cannot be opened. Is there another you'd recommend trying with?
Just as some quick unscientific results - DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0
did help.
The overall CPU usage went down by about 25% (from ~75% into the 55-60% range) - not enough to completely fix the issue, but it certainly helped. I'm going to spend a bit more time with various profilers to see if I can improve things from the code side.
Thankyou both for your detailed comments!
Description
When running a load that is highly burstable - i.e. creates lots of threadpool workers that aren't used persistently, the threadpool itself uses a lot of CPU time waiting for jobs, that could interfere with background processes (outside the application).
The codebase extensively uses async/await for primarily low-level networking related operations.
Configuration
.NET Core LTS 6.0 Linux (Ubuntu 22.04) AWS EC2 t2.medium host
Data
Measured using a combination of htop and JetBrain's dotTrace - when 17 threadpool workers are created, the system CPU usage is around 75% +/- 15% (averaging both cores).
dotTrace (sampling mode) reports that 35.4% of the CPU usage is spent in the following stack trace:
Another 45% or so is spread among
10.5%
System.Threading.Monitor.Wait(Object, Int32)
9.8%
System.IO.FileSystemWatcher+RunningInstance.TryReadEvent(out NotifyEvent)
8.7%
5.3%
System.Threading.Thread.Sleep(Int32)
4.9%
Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
Which all appear to be roughly symptomatic of the same thing - even if different in specifics.
Analysis
Across a range of areas (see stack traces above) the runtime appears to be spinlocking while waiting for jobs, rather than using interrupts which may be better in this scenario when some latency in picking up the job is acceptable (i.e. I'm fine waiting 5ms for the job to start).
This would allow the CPU to idle more lowering both environmental impact (both real world, and within the OS instance), as well as better manage things such as AWS's CPU Credits; and provide more accurate reporting in system level analytics on the real system load.