dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
34.9k stars 9.86k forks source link

Microsoft.AspNetCore.Hosting perf counters are slow #50412

Open EgorBo opened 10 months ago

EgorBo commented 10 months ago

We're trying to benchmark a webapp on a many-cores system and we're mainly interested in RPS metrics at the moment. We don't want to rely on client side RPS metrics (bombardier or Wrk depending on OS) because there are multiple clients and they're not started instantly at the same time. It seems that Microsoft.AspNetCore.Hosting perf counters can be used to calculate RPS on aspnet side which is what we need. Unfortunately, it seems that these counters make everything twice slower when enabled.

Basically, it can be simulated on our PerfLab with crank:

crank 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml 
--scenario json 
--profile arm-lin-28-app 
--profile intel-load2-load 
--application.framework net8.0 
--application.collectDependencies true 
--application.options.collectCounters true 
--application.options.counterProviders "Microsoft.AspNetCore.Hosting"

the last line is the culprit. Is it possible to somehow achieve an overhead-free server-side RPS metric? The current culprit seems to be EvenPipe:

image

Tried Linux-x64, Linux-arm64 and Windows-x64 machines.

UPD: Current suspects:

Mode RPS
No counters 1M
Collect "System.Runtime" counters 1M
Collect "Microsoft.AspNetCore.Hosting" counters 0.48M
Collect "Microsoft.AspNetCore.Hosting" counters, EventPipeEnableStackwalk=0 0.65M
Collect "Microsoft.AspNetCore.Hosting" counters, EventPipeEnableStackwalk=0, without Interlocked 0.75M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked, without lock in CounterAggregator 0.8M
EgorBo commented 10 months ago

also, cc @dotnet/dotnet-diag and @sebastienros

EgorBo commented 10 months ago

Also, can the cpu cache contention be an issue here? https://github.com/dotnet/aspnetcore/blob/main/src/Hosting/Hosting/src/Internal/HostingEventSource.cs#L55-L64

we perform 3 interlocked operations per each request

davmason commented 10 months ago

Can you set DOTNET_EventPipeEnableStackwalk=0 and try again? That will make EventPipe never try to collect managed stacks for events and should greatly increase performance. It will mean you don't get stacks for any events, but I think that is OK with your scenario from what I see

EgorBo commented 10 months ago

Can you set DOTNET_EventPipeEnableStackwalk=0 and try again? That will make EventPipe never try to collect managed stacks for events and should greatly increase performance. It will mean you don't get stacks for any events, but I think that is OK with your scenario from what I see

Thanks! It does improve things, but still slower than just System.Runtime counters (which don't affect RPS)

Mode RPS
Defaut 1M
Collect counters, System.Runtime 1M
Collect counters, Microsoft.AspNetCore.Hosting 0.48M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0 0.65M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked 0.75M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked, without lock in CounterAggregator 0.8M
sebastienros commented 10 months ago

we perform 3 interlocked operations per each request

At the level of concurrency we have on these systems (a lot of concurrent requests and 80 cores on some of the ARM64 machines) it's probably the reason.

EgorBo commented 10 months ago

So, it seems that Interlocked counters do contribute but there is still something else that slows us down here. I compiled Microsoft.AspNetCore.Hosting.dll locally and removed the counters and got +0.1M RPS (resulting in 0.75M RPS)

EgorBo commented 10 months ago

Windows-x64, Interlocked.* operations commented (so the provider reports 0 RPS) and DOTNET_EventPipeEnableStackwalk=0:

image

Traces for a random TP thread:

image

so 20% of perf is still wasted somewhere

EgorBo commented 10 months ago

Another suspect is https://github.com/dotnet/runtime/blob/9c3f8b3727d9be4de483a1d725c2bda22f956688/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/CounterAggregator.cs#L19

image

PS: yes, it's responsible for another 50k RPS

davidfowl commented 10 months ago

Are we fixing this for .NET 8?

noahfalk commented 10 months ago

What EventPipe providers do you have enabled in the Collect "Microsoft.AspNetCore.Hosting" counters case? There is both an EventSource named Microsoft.AspNetCore.Hosting as well as a Meter named "Microsoft.AspNetCore.Hosting". All Meters get reported in aggregate via the MetricsEventSource.

Its possible to enable both, but there is likely no reason to do so. Looking at the code in ASP.NET Core, you may also be paying the costs to track both sets of metrics even if you only enabled one of them: https://source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,57 https://source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,353 Its possible to check if they are enabled individually, but currently the code doesn't do that.

Given you saw improvements removing the locks in CounterAggregator, I assume you enabled MetricsEventSource. Do you know what filter arguments got passed that select which Meters and Instruments to listen to? For example if you enabled all instruments on the Microsoft.AspNetCore.Hosting Meter then you would have enabled the request duration histogram and that histogram update hits a different lock here: https://github.com/dotnet/runtime/blob/9c3f8b3727d9be4de483a1d725c2bda22f956688/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ExponentialHistogramAggregator.cs#L176

so 20% of perf is still wasted somewhere

If something not yet accounted for is taking 20% of the time that should make ETW traces look fairly different. Is the difference being elusive and things feel stuck, or its just some time is needed to peel the next layer of the onion?

Is it possible to somehow achieve an overhead-free server-side RPS metric?

Nothing is totally free, but I think we could get a metric that has sufficiently low overhead it wouldn't be observable at the scale of this measurement. We can do some combination of:

JamesNK commented 10 months ago

Its possible to enable both, but there is likely no reason to do so. Looking at the code in ASP.NET Core, you may also be paying the costs to track both sets of metrics even if you only enabled one of them: source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,57 source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,353 Its possible to check if they are enabled individually, but currently the code doesn't do that.

Adding separate checks for metrics vs event counters in ASP.NET Core hosting isn't difficult. I'll do that and aim for .NET 8. Edit: PR https://github.com/dotnet/aspnetcore/issues/50412

However, it won't help if someone enables metrics and event counters. I don't see why someone would want both. I think there is a task here to look at what .NET telemetry libraries and tooling are doing when they asks to listen for counters. I know dotnet-counters listens to both, but people aren't going to use that in production, so it doesn't matter. What about the .NET OpenTelemetry SDK? Or other counter gathering tooling?

I don't think it is worth in investing in optimizing the HostingEventSource's Interlocked.Increment calls. We should encourage people to move towards metrics.

JamesNK commented 10 months ago

My PR to enable metrics and event counters independently didn't have any impact on the benchmark. The RPS is approximately 500k both before and after. That means application.options.counterProviders "Microsoft.AspNetCore.Hosting" is capturing event counters and metrics counters. Supporting enabling them independently is a Good Thing and the PR should go into the product, but it has no impact here.

The performance of metrics incrementing values is out of our hands in aspnetcore. Perf sensitive code is in System.Diaganostics.Metrics. There is a PR looking at it here: https://github.com/dotnet/runtime/pull/91566

Something we can impact in aspnetcore is the event counters' performance. We could update hosting event counters to use something other than Interlocked. We could implement something similar to what https://github.com/dotnet/runtime/pull/91566 is doing. Or use ThreadLocal<long>. @sebastienros I have a vague memory of you talking about a library that did this to improve ARM perf. Can you share details, or did I imagine the conversation?

danmoseley commented 10 months ago

Are all of our counters required to be exact? Where a nearly accurate value is acceptable we can use an algorithm similar to what is done here to avoid interlocked operations.

https://github.com/dotnet/runtime/pull/84427

JamesNK commented 10 months ago

There are two counters on the hot path: total requests and active requests.

It would be surprising if the total request count wasn’t exact. And it would be a regression from current behavior.

The active request count is incremented when a request starts and decremented when the request ends. It definitely needs to be exact. Otherwise, you could end up with a situation where an idle server reports there are active requests. Or even there are negative active requests.

noahfalk commented 10 months ago

Are all of our counters required to be exact? Where a nearly accurate value is acceptable we can use an algorithm similar to what is done here to avoid interlocked operations.

I think it depends what people want to use them for. In the scenario here a small error might be acceptable, in others people probably expect counts to be exact. I could easily imagine this is something that tools opt into when it is appropriate for the scenario at hand, but I think it would confuse people if we introduced error bars by default.

@EgorBo @sebastienros - can you guys help me answer some of the questions in https://github.com/dotnet/aspnetcore/issues/50412#issuecomment-1708055156? It sounds like @JamesNK answered one of them which is crank is enabling both the HostingEventSource EventCounters and the new Meters via MetricEventSource. Presumably you don't need both and when you change the tool to disable one the overhead is going to go down. If the goal is to get a good RPS measurement right now probably the EventCounters are going to have lower overhead at the moment. If the goal is improve the perf of Meters when run by customers then lets disable the EventCounters so we isolate and work on just the Meter overhead.

noahfalk commented 10 months ago

There is a PR looking at it here: https://github.com/dotnet/aspnetcore/issues/50412

Did you mean https://github.com/dotnet/runtime/pull/91566?

davidfowl commented 10 months ago

Lets make sure our counters are exact, there are several options we have here to improve the performance of how we track that (@stephentoub mentioned these techniques in the other issue on runtime)

This is an option Or something like this

We don't want to use the approximate counting algorithm here.