dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.33k stars 4.49k forks source link

[Performance] Regression on Fortunes - Windows #101936

Open sebastienros opened 3 weeks ago

sebastienros commented 3 weeks ago

Isolated to these commits: https://github.com/dotnet/runtime/compare/774bc93087ad...d688ac290cd7 Only on Windows 320K RPS to 300K RPS. There is also a visible first request regression on the same commits.

image image image

Command line

Between 9.0.0-preview.4.24223.5 and 9.0.0-preview.4.24223.6

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/database.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario fortunes --profile intel-win-app --profile intel-load-load --profile intel-db-db --application.framework net9.0 --application.options.collectCounters true --application.collectDependencies true --load.options.reuseBuild true --application.aspNetCoreVersion 9.0.0-preview.4.24218.1 --application.runtimeVersion 9.0.0-preview.4.24223.6 --application.sdkVersion 9.0.100-preview.4.24223.13
stephentoub commented 3 weeks ago

Is anything in Fortunes using Channel.CreateUnbounded to create a channel that's used on a per-request code path?

sebastienros commented 3 weeks ago

Npgsql seems to use it, @roji ?

roji commented 3 weeks ago

Yep, an unbounded channel is the internal implementation of Npgsql's connection pool (Max Pool Size is enforced in other ways, so the channel itself is unbounded).

Is there any particular concern here? Was this a regression in the Channel implementation?

/cc @NinoFloris @vonzshik

stephentoub commented 3 weeks ago

Is there any particular concern here?

Not for you :smile:

Is there any particular concern here? Was this a regression in the Channel implementation?

That's what I'm questioning. One of the changes in the diff range was a change that shouldn't have impacted throughput of the channel created by Channel.CreateUnbounded but seems like it may have.

stephentoub commented 3 weeks ago

@roji, which of these /where would end up being used on a hot path in Fortunes?

dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to this area: @dotnet/area-system-threading-channels See info in area-owners.md if you want to be subscribed.

vonzshik commented 3 weeks ago

@stephentoub that's probably going to be the one in the connector.

The way it works is that each connector (that's a physical connection to pg) starts their own async loop to handle query responses. Then, whenever data source receives queries, it dispatches them to a connector with the least in-flight query count and after each dispatch it attempts to write that query to connector's socket. If that write does complete synchronously, it's just going to immediately dispatch another query to the exact same connector. That loop repeats until write doesn't complete synchronously.

As for the code, here queries are received from data source. Not sure whether it's relevant, but data source write loop is single threaded, so there shouldn't be any contentions.

stephentoub commented 3 weeks ago

that's probably going to be the one in the connector.

Hmm. That one wouldn't have been affected at all by https://github.com/dotnet/runtime/commit/b4e0169bfe10cfe69f6e7a8952b8f80fdfe9e31e, as that doesn't touch the code used when SingleReader is true.

vonzshik commented 3 weeks ago

The only other one we have is the one we use for the pooling (just as @roji mentioned above).

It potentially could be quite hot since the way fortunes designed to:

  1. Get connector from the pool 1.1. Attempt to get an idle connector 1.2. If previous step fails, start waiting for the channel reader to return a connector
  2. Execute a query (which takes extremely short time)
  3. Return connector to the pool

Now, there is a catch. This particular path shouldn't ever be executed with fortunes. The reason for this is because we use multiplexing mode with TE benchmarks (as it's the most performing one), and it completely avoids connector renting, instead going through multiplexing data source (which I described above). @roji is there a chance this benchmark doesn't use multiplexing?

roji commented 1 week ago

Sorry for taking so long to answer here.

It seems like this (non-platform) benchmark also uses multiplexing, so there indeed shouldn't be a path where the unbounded channel is used. The other case of channel-usage has already been discussed above - it's not unbounded in any case...