akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

[PERF] Akka.Cluster Idle CPU on ARM #7223

Open Aaronontheweb opened 3 months ago

Aaronontheweb commented 3 months ago

Version Information Version of Akka.NET? v1.5.21 Which Akka.NET Modules? Akka.Cluster, Akka.Remote, Akka

Describe the performance issue

From a user in our Discord - it looks like Akka.Cluster has significantly higher idle CPU on Apple Silicon ARM chips that it does on x64 chips.

Data and Specs

image

Expected behavior

Idle CPU should be less than 1% per process across all platforms.

Actual behavior

Idle CPU can be as high as 28% on ARM.

Additional context

This is mostly a .NET runtime issue, but we should keep an eye on in it in case there's something we're doing to exacerbate it or if there's something we can do to mitigate the issue.

Zetanova commented 3 months ago

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

Aaronontheweb commented 3 months ago

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

I thought the biggest culprits for this would have been our DedicatedThreadPool, but these are numbers are with those disabled - this is all using the built-in .NET ThreadPool.

Zetanova commented 3 months ago

https://github.com/akkadotnet/akka.net/issues/5400#issuecomment-1020871040 https://github.com/akkadotnet/akka.net/issues/5400#issuecomment-1021952961

maybe we can implement something like a StopWatch with it, but for CPU cycles. It would have an usage not only inside perf-tests but maybe also inside the ActorCell scheduler Algo

Zetanova commented 3 months ago

I posted view years some system API's to read the consume cycles from a process for windows/linux Maybe there is something newer/better out in the dotnet sdk tools.

I thought the biggest culprits for this would have been our DedicatedThreadPool, but these are numbers are with those disabled - this is all using the built-in .NET ThreadPool.

Don't talk about the issue itself, but about your measurements. In k8s other "cloud" there are CPU units m https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

We don't need to use the same metrics, To read the use CPU cycles from the OS would be optimal for unit-tests and benchmarks Maybe its even possible to use them in runtime for workload measurement and scheduling and health-checks.

Aaronontheweb commented 3 months ago

Ah got it, you think this might just be an instrumentation issue then?

Aaronontheweb commented 3 months ago

Worth mentioning: I requisitioned all of the hardware for building a long-term Akka.NET observation lab yesterday https://x.com/Aaronontheweb/status/1797731816042049944

Going to have some experiments that are designed to run continuously for months in here, including idle CPU measurements. Bought a Raspberry Pi 5 for testing ARM support specifically.

Zetanova commented 3 months ago

the used distro/kernel level can make a difference too.

Tip: and don't write the log/output to your SD card, it will trash the card very fast.

Zetanova commented 3 months ago

I will make a demo project for the cycle measurement.

Aaronontheweb commented 3 months ago

the used distro/kernel level can make a difference too.

Tip: and don't write the log/output to your SD card, it will trash the card very fast.

Good idea - was planning on having a log-aggregator and OTEL running on a separate host (x64 instance)

Zetanova commented 3 months ago

@Aaronontheweb here is the demo cycle watch https://github.com/Zetanova/CycleReader It is currenlty only for win, will make linux/OS in the next days

Zetanova commented 3 months ago

@Aaronontheweb Its not possible to read some counter to get a "cpu-work done" value.

There are some registers in x64 and armv6+ to read cycles for the thread out, but it is very hard to read them over c# and they are not useful as they are, when the TaskPool is getting involved.

The best unit would be to measure "CPU units" like linux and clouds provider do. this is cpuUnits = processorTime / elapsedTime

Win and Linux provide counters for process and thread cpu time, but the System.Diagnostics.Processor class can be used for it.

It can be used for a simple integration tests to measure the idle cluster CPU or CPU utilization for a calibration workload to compare OS/Arch

Simplest form of a idle integration test

var p = Process.GetCurrentProcess();
var sw = new Stopwatch();

var processorTime0 = p.TotalProcessorTime;
sw.Start();

//do work or idle around
await Task.Delay(10_000);

var processorTime1 = p.TotalProcessorTime;
sw.Stop();

var processorTime = processorTime1 - processorTime0;
var cpuUnits = processorTime / sw.Elapsed;

//was idling?
Assert.True(cpuUnits < 0.01)

I put the tests in the above repo.

Aaronontheweb commented 3 months ago

Its not possible to read some counter to get a "cpu-work done" value.

We're just planning on sticking it in K8s with its own namespace and measuring mCPU used over time on a Grafana chart