Closed markusschaber closed 8 months ago
Profiling one of our services (not lighthouse) seems to imply that the DedicatedThreadPool and DotNetty are the main culprits:
More details on the DedicatedThreadPool:
The ChannelExecutor
mentioned in that PR might be a good candidate - I owe @zetanova a re-review of it again.
https://github.com/akkadotnet/akka.net/pull/5390 might also help - alters the "waiting" mechanism used by the DedicatedThreadPool
.
@markusschaber You can use my https://github.com/Zetanova/Akka.Experimental.ChannelTaskScheduler But u need to downgrade to Akka 1.4.21, because of some improvements of Ask the cluster-startup extensions is very racy at startup in later versions
I made view PR's to fix cluster startup and hope that it is fixed with 1.4.29
My ChannelTaskScheduler does not reduce all idle CPU to zero (as it should be) but removes the scaling issue completely. I run currently 16 nodes on k8s and everyone are idling with between 20m and 50m
The other option would to switch the dispatcher to the build in but not used TaskPoolDispatcher, but the cluster will down itself on heavy workload, because the cluster packets are processed to late.
Can you test with the latest nightly to verify these are fixed?
Sent from my iPhone
On Nov 26, 2021, at 8:25 AM, Andreas Dirnberger @.***> wrote:
@markusschaber You can use my https://github.com/Zetanova/Akka.Experimental.ChannelTaskScheduler But u need to downgrade to Akka 1.4.21, because of some improvements of Ask it is very racy at startup in later versions
I hope the racy startup will be fixed with 1.4.29
My ChannelTaskScheduler does not reduce all idle CPU to zero (as it should be) but removes the scaling issue completely. I run currently 16 nodes on k8s and everyone are idling with between 20m and 50m
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
As for the DotNetty CPU issues - can’t fix those without replacing the transport, which we are planning on doing but it’s a ways out.
Sent from my iPhone
On Nov 26, 2021, at 9:06 AM, Aaron Stannard @.***> wrote:
Can you test with the latest nightly to verify these are fixed?
Sent from my iPhone
On Nov 26, 2021, at 8:25 AM, Andreas Dirnberger @.***> wrote:
@markusschaber You can use my https://github.com/Zetanova/Akka.Experimental.ChannelTaskScheduler But u need to downgrade to Akka 1.4.21, because of some improvements of Ask it is very racy at startup in later versions
I hope the racy startup will be fixed with 1.4.29
My ChannelTaskScheduler does not reduce all idle CPU to zero (as it should be) but removes the scaling issue completely. I run currently 16 nodes on k8s and everyone are idling with between 20m and 50m
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
@Aaronontheweb https://github.com/Zetanova/PNet.Mesh UDP traffic - full encrypted not fully tested and NAT+TURN is still missing
Unit tests traffic is going through already full encrypted with crypto routing.
One other option that we could start to think about is the multi-home problem in akka.cluster, akka.remote and akka.discovery Found here: https://github.com/akkadotnet/akka.net/discussions/4993
Currently the Akka.Cluster has even a problem with normal DNS names
it will already break the cluster
if some nodes are using: akka.tcp://node1:2334
and others: akka.tcp://node1.myNamespace.cluster.local:2334
and the rest: akka.tcp://node1.myNamespace:2334
@Zetanova I'll try to test on Monday. We're in european time zone :-)
i tried now the nigthly build 1.4.29-betaX and it did fix the cluster startup problems.
@Aaronontheweb https://github.com/Zetanova/PNet.Mesh UDP traffic - full encrypted not fully tested and NAT+TURN is still missing
Unit tests traffic is going through already full encrypted with crypto routing.
One other option that we could start to think about is the multi-home problem in akka.cluster, akka.remote and akka.discovery Found here: #4993
Currently the Akka.Cluster has even a problem with normal DNS names it will already break the cluster if some nodes are using:
akka.tcp://node1:2334
and others:akka.tcp://node1.myNamespace.cluster.local:2334
and the rest:akka.tcp://node1.myNamespace:2334
@Zetanova this looks pretty nice, it looks like it might be able to support TCP as well with some work? (Just thinking about environments where UDP might be an issue).
FWIW Building a Transport is deceptively simple with one important caveat. @Aaronontheweb can correct my poor explanation here but there's a point during handshaking that some of the in flows need to remain 'corked' while the AssociationHandle is being created. In the DotNetty transport this is handled via setting channel's AutoRead
to False and then back to True.
@to11mtm UDP fits very well for Mesh/VPN and encryption,
TCP is for streaming good and not for single packets. that's way ikev2, OpenVPN, Wireguard, P2P are using UDP
or working over UDP best. There is nearly no "place" where UDP is not supported.
The Idea with PNet.Mesh
is to have a simple UDP Socket with no other OS requirements
and crypto routing/addressing (not relaying on ipv4 addresses)
Wireguard itself would be optimal for akka too, but HAS use restriction. it's not easy or possible to use wireguard in kubernetes, but a simple UDP Socket is easy.
Akka is a only message based system and don't really need a persistent connection between each node, it needs only connectivity between nodes. Maybe we can abstract it away, I wrote here an idea to it: https://github.com/akkadotnet/akka.net/discussions/4993
Hmm. When I build our real services against 1.4.29-beta637735681605651069, the load per service seems to be a bit lower, but still around 190 mCPU compared to 210 mCPU with 1.4.27. So either my build went wrong, or the nightly does not help as much as I hoped.
(Are there nightly builds of lighthouse I could use to nail down where the difference is?)
I could not try the Experimental ChannelTaskScheduler yet. It seems there's no NuGet package available, and our Policy forbids copy-pasting 3rd party code into our projects, so I'll need to package it and host it on our internal NuGet Feed, which takes some time (busy with other work right now...)
@markusschaber ah, my comment was for @Zetanova to resolve his startup issue with the ChannelDispatcher
Ah, I see... And it seems that there's quite some jitter in the mCPU usage, after some time, I also get phases with about 210 mCPU wit the nightly...
After hacking together a solution using the https://github.com/Zetanova/Akka.Experimental.ChannelTaskScheduler with 1.4.29-beta637735681605651069, it got considerably better. Running the same services with the default hocon for the ChannelTaskScheduler, the CPU usage is down to 60-90 mCPU, so this is around 1/2 to 1/3 of the original CPU usage.
The times I've tested that, there have been some throughput tradeoffs - but on balance that might be the better trade for your use case.
In terms of replacing the DotNetty transport - I'd be interested in @Zetanova's ideas there and I have one of my own (gRPC transport - have some corporate users who rolled their own and had considerably higher throughput than DotNetty) that we can try in lieu of Artery, which is a much bigger project.
Thanks for your efforts. I'm looking forward to an official solution, which can be used in production code without bending compliance rules. :-)
Thanks for your efforts. I'm looking forward to an official solution, which can be used in production code without bending compliance rules. :-)
Naturally - if @Zetanova is up for sending in a PR with the upgraded ChannelDispatcher
as part of v1.4.29 I'd be happy to merge that in and make it an "official" dispatcher option even if it's not set as the default. Meaning, we'll accept and triage bug reports for it.
As for some alternative transports, I'd need to write up something lengthier on that in a separate issue but I'm open to doing that as well - even prior to Akka.NET v1.5 and Artery.
Thank you very much!
As far as I can see, the main issue with the schedulers are the busy loops, things like Thread.Sleep(0) in tight loops seem to burn most of the CPU in our case. I might try to look into that on my own, and submit a pull request if anything valuable comes out.
If anything possible, I'd like to have something more like 10-20 mCPU per Service if there's no traffic...
As far as I can see, the main issue with the schedulers are the busy loops, things like Thread.Sleep(0) in tight loops seem to burn most of the CPU in our case. I might try to look into that on my own, and submit a pull request if anything valuable comes out.
"Expensive waiting" is a tricky problem - that and scaling the DedicatedThreadPool
without reinventing the hill-climbing algorithm used by the managed thread pool go hand-in-hand. That's what the ChannelDispatcher does well: solves the mutually exclusive scheduling problem that the DedicatedThreadPool
does by adding different priority work queues on top of the managed ThreadPool
, so we still benefit from the hill-climbing algorithm scaling without suffering from the usual starvation problems that occur when everything runs on the same threadpool.
I'm not sure whether busy waiting actually brings enough benefits, compared to just using a lock / SemaphoreSlim or similar primitives using the OS scheduler. (As far as I know, "modern" primitives like SemaphoreSlim already use optimized mechanisms like futexes and fine-tuned spinning under the hood.) As far as I know, the main purpose of busy looping is to reduce the overhead and latency introduced by context switches in case another CPU fulfils the condition we're waiting for. However, Thread.Sleep(0) by definition introduces context switches. to my knowledge, the OS schedulers are nowadays rather good at solving things like starvation and priority inversion, so trying to outsmart the OS might not be the optimal solution in all cases. Checking the "Wait(TimeSpan)" Implementation in the "UnfairSemaphore", I'm not convinced that spinning 50 times through Thread.Sleep(0) on several Threads/CPUs in parallel is actually better than falling back to the SemaphoreSlim after 1 or 2 tries. Maybe the "UnfairSemaphore" could be improved to fine-tune the number of looping threads with the actual load, or it could just be replaced by a SemaphoreSlim directly for some workloads.
Independently, one could argue that any starvation by using the normal thread pool is either a misconfiguration of the thread pool (not enough minimum threads), or a misuse of the thread pool (long running tasks should go to a dedicated thread, blocking I/O should be replaced by async, etc...). Whether that kind of reasoning is acceptable by your users is an entirely different question, and apparently, minds much smarter than me have to fight tricky thread starvation problems (see https://ayende.com/blog/177953/thread-pool-starvation-just-add-another-thread or https://github.com/StephenCleary/AsyncEx/issues/107#issuecomment-328355768 for examples...) - there's a reason one of our services had a line like ThreadPool.SetMinThreads(500, 500);
in the startup code for some time... (Btw, according to Microsofts documentation, those 500 Threads are still created "on demand" (just instantly when there's no free thread available), so if 20 threads are enough to saturate the workload, no more threads will ever be created.)
Independently, one could argue that any starvation by using the normal thread pool is either a misconfiguration of the thread pool (not enough minimum threads), or a misuse of the thread pool (long running tasks should go to a dedicated thread, blocking I/O should be replaced by async, etc...)
In our case, the issue is simple: /system
tasks, such as Akka.Cluster heartbeats, have real-time processing requirements - i.e. they fail if not responded to within N second. Large ThreadPool
work queues that don't allow workload prioritization natively make it difficult for us to uphold those across busy systems where /user
workloads are application-dependent and unknown to us. Therefore, we needed a generalizable solution for prioritizing some workloads over others that would work across hundreds of thousands of different use cases. Starvation occurs at the "task in queue" level - given the other items queued for execution, the system wasn't able to service that task in time for it to meet its requirements and keep the cluster available.
Of the solutions we tried years ago (i.e. Akka.NET 1.0-1.1,) separating the workloads at the thread level was what offered the highest throughput in exchange for the least amount of total complexity. Fine-tuning the performance of how that DedicatedThreadPool
does its job could yield some better results in terms of how it scales (it's statically allocated based on vCPU counts now) or in terms of how it waits for work when idle would certainly be of interest.
Our job itself isn't so simple - the prioritization has to be handled somewhere; delegating everything to the ThreadPool
without it has yielded poor results in busy systems historically. Additionally, the idle CPU vs. throughput tradeoff has historically been won by "throughput" in terms of "what do users care most about?" so that's primarily what's driven our development efforts there, therefore there is probably a lot of low-hanging fruit that could be picked to help optimize it. Offering choices for different use cases (i.e. ChannelDispatcher
for users that prefer low resource consumption) or simply putting in the work to reduce idle CPU (see https://github.com/akkadotnet/akka.net/issues/4031) are both good options for mitigating the issue.
Hmm, having a closer look at the DedicatedThreadPool, it says:
It prefers to release threads that have more recently begun waiting, to preserve locality.
Maybe we could just solve this problem with some kind of "Stack" of "SemaphoreSlim" or similar, so we just wake up one thread at a time - the most recent waiter being the one on top of the stack. On the other hand, I'm not really sure whether the implied definition of "locality" really fits modern "big iron" hardware which require NUMA awareness etc. for best results. I see a contradiction between "the more CPUs we have, the bigger the chance that another CPU will queue some work while we poll" and "the more CPUs we have, the less likely the thread which most recently has begun waiting is acutally on the right CPU (or close to it in NUMA sense)."
Of course, this usually does not apply to "small" machines like single-socket desktop machines, but on those, it's also less likely that another CPU can queue other work when all CPUs are busy polling on the UnfairSemaphore. ;-)
I'm not sure whether busy waiting actually brings enough benefits, compared to just using a lock / SemaphoreSlim or similar primitives using the OS scheduler. (As far as I know, "modern" primitives like SemaphoreSlim already use optimized mechanisms like futexes and fine-tuned spinning under the hood.)
I bet we could parameterize https://github.com/akkadotnet/akka.net/blob/dev/src/benchmark/Akka.Benchmarks/Actor/PingPongBenchmarks.cs to switch between DedicatedThreadPool
and the default ThreadPool
so you could measure the impact of these DedicatedThreadPool
changes on throughput.
How would you create a benchmark to measure idle CPU? I've wondered about that in the past but without firing up and external system like Docker and collecting system metrics on a Lighthouse instance I'm not sure how to automate that.
Maybe the Performance Counter APIs could help on Windows (available for .NET Core via Platform Extensions, and natively for .NET Framework) On Linux, using external processes seems to be state of the Art, although I think one could use the same sources (/proc and /sys) from within the process.
https://github.com/devizer/Universe.CpuUsage also looks helpful, including Task/Async support.
As far as I can see, it should be possible to spawn several Akka actor systems as cluster members within the same process (as everything is nicely encapsulated, no static variables), then let them run and take some samples of CPU usage after a few seconds of warmup and after a defined benchmark time (e. g. one minute).
As far as I can see, it should be possible to spawn several Akka actor systems as cluster members within the same process (as everything is nicely encapsulated, no static variables), then let them run and take some samples of CPU usage after a few seconds of warmup and after a defined benchmark time (e. g. one minute).
I think this would work:
We could write something that doesn't use Benchmark.NET but does use https://github.com/devizer/Universe.CpuUsage
Although I did pester one of the Benchmark.NET maintainers on how to do this using their library: https://twitter.com/Aaronontheweb/status/1465374882129133574
@Aaronontheweb The tradeoff is not between better idle-cpu and throughput but between idle-cpu and latency Reason is that a max used system (100% load) has 0% idling and should consume zero idle-cpu.
Yes, we can measure idle-akka system with the difference of the cpu used over a fixed period. Under Win it should be easy, but I never done it under linux programmatically.
Beside throughput and idle-cpu measurement, we would need some latency/reaction test too. Meaning how fast can an idle-system react to a received message.
Both the idle-cpu-test and latency-test would be required together.
Or else following extreams would be optimal
best for idle-cpu: Thread.Sleep(infinite)
best for latency: while(true) { ... }
A scenario that requires real-time processing (best latency) would be microcontroller like andriono and the work near always with life-locks
The Akka target for idle-cpu should be to be equal or less then mssql or eventstore server are using in an idle-state and do not scale with the cluster size. Idling nodes in a cluster of 5 nodes should have near the same idle-cpu as a cluster of 20 nodes.
@markusschaber If u have more then 60m with the ChannelTaskScheduler and it increases with the node count then you made something wrong. Maybe u forgot to set the akka config ?
The ChannelTaskScheduler will log:
[11:03:06 DBG] Launched Dispatcher [akka.remote.default-remote-dispatcher] with Priority[High]
On my cluster with 16nodes and all idling around, they have each 25-55m As comparison mssql needs 12m and eventstore 29m
This still higher idling-cpu is produced in the AkkaScheduler and from Akka.Remote and Akka.Cluster. I saw view places where events got scheduled in a period under 100ms, even if they produced for a long period no work. This would be a place for improvements like to add a backoff strategy
The "new" Slim wait handles like SemaphoreSlim
are trading of latency for CPU
They have a much better latency, because they are using for the first view ms (100ms or so)
a "normal" live-lock while(true) { .... }
and then using the OS wait handles like the old ones before the Slim type.
I will make an PR for the ChannelTaskScheduler
@Aaronontheweb The tradeoff is not between better idle-cpu and throughput but between idle-cpu and latency Reason is that a max used system (100% load) has 0% idling and should consume zero idle-cpu.
I just meant that, historically, our benchmarks for Akka.Remote and its infrastructure have been primarily geared around increasing throughput - idle-cpu and even latency have taken a backseat, although latency has become an area of focus recently with https://github.com/akkadotnet/akka.net/issues/5203
The Akka target for idle-cpu should be to be equal or less then mssql or eventstore server are using in an idle-state and do not scale with the cluster size.
Agree.
Idling nodes in a cluster of 5 nodes should have near the same idle-cpu as a cluster of 20 nodes.
The "idle" workload for a cluster should level out once you hit the watched-by
factor on heartbeats, which defaults 8 I think. But yes, I agree. These are good targets.
@Zetanova I'm not sure what I'm doing wrong. I get the log messages:
[DEBUG][30.11.2021 08:50:52][Thread 0001][EventStream] StandardOutLogger started [DEBUG][30.11.2021 08:50:52][Thread 0001][ChannelExecutor-[akka.actor.internal-dispatcher]] Launched Dispatcher [akka.actor.internal-dispatcher] with Priority[High] [DEBUG][30.11.2021 08:50:52][Thread 0001][ChannelExecutor-[akka.actor.default-dispatcher]] Launched Dispatcher [akka.actor.default-dispatcher] with Priority[Normal] [DEBUG][30.11.2021 08:50:52][Thread 0001][EventStream(CasActorSystem)] Logger log1-AkkaMicrosoftLogger [AkkaMicrosoftLogger] started [DEBUG][30.11.2021 08:50:52][Thread 0001][EventStream(CasActorSystem)] StandardOutLogger being removed dbug: akka[0] Logger log1-AkkaMicrosoftLogger [AkkaMicrosoftLogger] started dbug: akka[0] StandardOutLogger being removed dbug: akka[0] Default Loggers started dbug: akka[0] Launched Dispatcher [akka.remote.default-remote-dispatcher] with Priority[High] info: akka[0] Starting remoting dbug: akka[0] Starting prune timer for endpoint manager... info: akka[0] Remoting started; listening on addresses : [akka.tcp://CasActorSystem@192.168.56.107:5508] info: akka[0] Remoting now listens on addresses: [akka.tcp://CasActorSystem@192.168.56.107:5508] info: akka[0] Cluster Node [akka.tcp://CasActorSystem@192.168.56.107:5508] - Starting up... info: akka[0] Cluster Node [akka.tcp://CasActorSystem@192.168.56.107:5508] - Started up successfully
CPU Load with just 3 seed nodes running settles around 20mCPU per Node. Starting our other services (which include 9 Akka cluster nodes, one of them started with 2 replicas in my test setup) settles around 60-90 mCPU, with most values close to 70.
@markusschaber Before all 12 nodes over 250m each? And how much do services like mssql and eventstore require on your k8s? I am running on a on-premise with 3,2Ghz+ CPU's
With "stock" Akka 1.4.27 and 1.4.28, it was in the range of 190-200m each.
Other .NET Core services (not part of the cluster), PostgreSQL and nginx-ingress all around 0-10m, kubernetes metrics server 10-30m.
My test setup is a VM running on an Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz. So I epxect a bit higher numbers than on your machine. I'm okay with 3 seed nodes using 20m each, but I'm a bit concerned that the load raises so much when starting the other services.
I hesitate to blame the sharding, as it only spans one service (with 2 replicas), so it should not affect the other nodes, right?
Sharding won't affect any nodes that aren't actively hosting shards - and sharding only really does work when spawning entities, rebalancing shards, or routing messages to entities. In an "idle" cluster none of the above are happening.
If you're using DData to power the shard state storage system there will be a small amount of gossip up until the state is synchronized across all replicas, which will happen in a matter of seconds after startup when things are "idle."
edit: although, there are some periodic timers DData uses for pruning and re-sync in the background but it's relatively minor. Uses about the same level of overhead as Akka.Cluster itself.
@Aaronontheweb where should be the ChannelTaskScheduler be placed for the PR,
In a new project or in the main Akka project?
It requires only System.Threading.Channels
as external lib that is since dotnet 3.1 in the default framework references.
@markusschaber <70m looks fine for me, as long as it does not increase more with the node count. To get lower then that we would need to find and fix the next bottleneck. It's most likely in Akka.Cluster+AkkaScheduler
@Zetanova do it in the main repo - we need to merge in channels in order to release https://github.com/akkadotnet/akka.net/pull/4742 anyway
Edit: actually we don't need it for that Akka.Streams PR but whatever - I'm fine taking a dependency on it.
@markusschaber try to set in the config from default akka.scheduler.tick-duration = 10ms
to 16ms
, 32ms
or even 48ms
I tested it only 2years ago, but this will make for sure lower idle-cpu
The current AkkaTaskScheduler is spinning around waits of the rest from 10ms with Thread.Sleep and never jumps an empty tick and has no "idle" state. It has its reasons way it cannot easily be implemented.
To rework the AkkaTaskScheduler is for long time already on my todo list and this is what I will do next. I have view ideas and knowledge to improve the idle-cpu, memory-access and scalability of it.
@Aaronontheweb Made now the PR https://github.com/akkadotnet/akka.net/pull/5403
@Zetanova are you talking about the HashedWheelTimer
?
yes, HashedWheelTimerScheduler,
maybe its simply work items that gets scheduled on it, but somewhere there comes idle-cpu from.
It definitely has an expensive waiting problem too, per #4031
I don't have any hard evidence yet, but I think the CPU load is higher when the VM has more CPUs available. Might be a problem with the code calculating the number of polling threads from the CPU number. (Remember that all pods in a kubernetes node share the CPUs...)
@markusschaber the DedicatedThreadPool
scales and pre-allocates threads according to the number of vCPU, so yes, that would do it if the vCPU number isn't reported lower in K8s.
Let me write something up on how to adjust these settings - it's been a while.
@Arkatufus has started adding some idle CPU profiling using the technique described here to https://github.com/petabridge/lighthouse/pull/237
I can't quite replicate this problem, spun up 21 lighthouse services in different processes and measured a single process CPU usage over time, the maximum CPU usage I've observed is about 9.5%. Could this problem be compounded by the things that ran inside the docker base image/kube?
@Arkatufus 9.5% CPU usage cumulative or per process? And on what type of hardware / environment?
9.5% on a single measured process. Ryzen 9, 12 core, 24 virtual core, 32 GB memory.
Bumped it to 41 lighthouse instances, max CPU usage was 17.5%
So that seems reasonably high to me @Arkatufus, given that there's only a background level of activity occurring. This tells me that systems like the scheduler and the DedicatedThreadPool are consuming a lot of resources while idle. "Expensively waiting" we've called it.
I think your experiment is a success at quantifying this - what do you recommend going forward for measuring any changes we make to Akka.NET in this regard? Is there a way we can incorporate that benchmark into the main Akka.NET repository or would you recommend keeping them separated?
"Expensive relative to what?"
Ideally, when Akka.NET isn't handling a /user
workload we want to get those idle CPU numbers as close to immeasurably small as possible. That's the goal.
It should be simple to incorporate this benchmark into Akka.NET, I don't know how to automate it though, right now, I have to start and run the test for each cluster node number changes and compare the numbers manually
@Arkatufus Best metrics would be to use "total process CPU cycles consumed",
The difference between the process-cycle-value at start-measurement and stop-measurement divided by time would be the our score (idle cpu)
Windows:
[DllImport("kernel32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool QueryProcessCycleTime(IntPtr ProcessHandle, out ulong CycleTime);
I never try it on under linux and docker.
the value (14) utime %lu
of /proc/self/stat
looks promising
this value multiplied back by sysconf(_SC_CLK_TCK)
should be process-cycles
under unix
example of windows values: process explorer:
Version Information reproduced with 1.4.27 and 1.4.28 Akka clustering, lighthouse
Describe the bug Idle akka clusters burn too much CPU.
To Reproduce Steps to reproduce the behavior:
watch kubectl top pods --namespace=akka-cqrs
"Expected behavior CPU load should be negible (not exactly 0, as some cluster gossip is happening...)
Actual behavior Even with 2 replicas, the CPU usage is rather high for an idle system. However, when increasing the number of replicas, the CPU usage per service also increases:
Starting 50 replicas straightly renders my kubernetes unusable, kubectl commands fail with various timeout errors.
Screenshots Output of
watch kubectl top pods --namespace=akka-cqrs
with 16 replicas:Environment Happens in different environments, the tests above were taken in a VM running Ubuntu, with 6 CPUs and 8GB Ram, running a single-node kubernetes cluster with microk8s installed via snap: kubectl versions: Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-18T02:34:11Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.3-3+9ec7c40ec93c73", GitCommit:"9ec7c40ec93c73c2281bdd2e4a75baf6247366a0", GitTreeState:"clean", BuildDate:"2021-11-03T10:17:37Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Additional context Can be a serious cost factor in environments which are to be payed per CPU usage, like some cloud services. In our case it's some test and dev environments which are configured "smallish", and burn their "cpu burst quota" rather quickly.
This might be related to https://github.com/akkadotnet/akka.net/issues/4537 .