dotnet / wcf

This repo contains the client-oriented WCF libraries that enable applications built on .NET Core to communicate with WCF services.
MIT License
1.71k stars 559 forks source link

Updating app to .net core has dramatically slowed down it's interaction with a WCF service #5271

Open JWardMAMG opened 1 year ago

JWardMAMG commented 1 year ago

We have a desktop application which uses a library of ours which wraps a client to a WCF service (using NetTcpBinding bindings) we have deployed internally. I recently updated this desktop application to .net 7 (from .net framework 4.8.1). The intermediary library and client are .net standard 2.0, and the service is .net framework 4.8.1 (I get this is a mismatch, if this seems the likely culprit, do let me know).

The .net framework version of the desktop app used the service very smoothly, but since updating, when making a large number of calls, the first calls seem to be extremely slow, then when they return results, the remaining are reasonably quick. I've separated the areas of code into a simple console app to try reproduce the behaviour. This app creates 50 objects, which all asynchronously load something from the service, which in turn creates some other objects to load some more items from the service, this is mimicking some behaviour in the desktop app. Simple actions such as moving the triggering of the second set of objects/loads to the same task as loads the first set can see the total time to load everything increase from ~4s to ~20s, and monitoring the time to process each result, sees a low initial throughput followed by higher throughput once they are cleared.

I set up a batch script repeatedly calling netstat on the service machine, filtering by my machine name, and the number of active connections seems dramatically higher in the .net 7 version than the .net framework version running the same code?

Is the .net 7 app/client simply creating higher throughput on the server and bottlenecking it? Or has the behaviour changed somewhat in the newer versions of .Net?

HongGit commented 1 year ago

@JWardMAMG Thanks for reporting the issue! Is it possible for you to provide a sample repro project in order for us to investigate the issue?

JWardMAMG commented 1 year ago

@HongGit Hi, thanks for getting back to me. I've been meaning to put together a simple service to test this without stressing our production service, I'll send that over when I've got it together, thanks!

JWardMAMG commented 1 year ago

In the meantime, rate limiting the calls the desktop client can make has helped, interestingly the sweet-spot seemed to be 9 calls (I think the default max connections is 10? Not sure if related). When running a test client using .net framework the apparent max tcp connections made to the service looks similar to my rate limited .net 7 client

betmix-matt commented 10 months ago

We have also experienced a similar issue. The .NET 6 runtime version has intermittent latency spikes of over 10 seconds nearly predictably (every 100 seconds) while the .NET framework 4 runtime showed no latency spikes.

I'll see if I can provide a simple reproduction client in both versions.

AngleOSaxon commented 6 months ago

I think I have a sample repo over here that repos this issue and shows it in some simple benchmarks. It certainly produces an issue where .NET 6 WCF clients are much slower than .NET Framework. It works by spinning up 500 clients and trying to perform three operations with each. The Framework benchmarks are okay and the synchronous .NET 6 benchmarks are very bad. The .NET 8 benchmarks are bad in a different way that I haven't investigated yet--I've only dug into the .NET 6 ones so far.

Here's the results from some local runs:

EDIT: I realized I made the rookie mistake of not awaiting the calls when doing the async method tests, which unsurprisingly shortened the run times on those tests (uh, whoops). I reran and updated these results--it doesn't change anything substantial about the relative performance.

.NET:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3447/23H2/2023Update/SunValley3)
12th Gen Intel Core i9-12900K, 1 CPU, 24 logical and 16 physical cores
.NET SDK 8.0.204
  [Host]        : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2
  MonitoringJob : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2

Job=MonitoringJob  IterationCount=10  RunStrategy=Monitoring

| Method                       | Runtime  | Mean     | Error      | StdDev     | Median   | Min      | Q1       | Q3       | Max       |
|----------------------------- |--------- |---------:|-----------:|-----------:|---------:|---------:|---------:|---------:|----------:|
| TestStreamed                 | .NET 6.0 | 45.371 s | 193.1581 s | 127.7621 s |  4.904 s |  4.574 s |  4.817 s |  5.255 s | 408.987 s |
| TestBuffered                 | .NET 6.0 | 13.129 s |  42.2004 s |  27.9129 s |  4.450 s |  3.423 s |  3.772 s |  4.906 s |  92.550 s |
| TestStreamed_500Threads      | .NET 6.0 |  4.874 s |   0.5159 s |   0.3413 s |  4.721 s |  4.577 s |  4.602 s |  5.109 s |   5.594 s |
| TestBuffered_500Threads      | .NET 6.0 |  3.178 s |   1.2775 s |   0.8450 s |  2.835 s |  2.472 s |  2.549 s |  3.700 s |   4.883 s |
| TestAsyncStreamed            | .NET 6.0 |  4.975 s |   1.1048 s |   0.7307 s |  4.629 s |  4.461 s |  4.561 s |  4.886 s |   6.602 s |
| TestAsyncBuffered            | .NET 6.0 | 14.817 s |  36.1426 s |  23.9061 s |  4.212 s |  2.599 s |  2.857 s |  4.253 s |  60.214 s |
| TestAsyncStreamed_500Threads | .NET 6.0 |  4.932 s |   0.7411 s |   0.4902 s |  4.768 s |  4.669 s |  4.707 s |  4.812 s |   6.280 s |
| TestAsyncBuffered_500Threads | .NET 6.0 | 14.919 s |  36.0563 s |  23.8490 s |  4.252 s |  2.781 s |  3.040 s |  4.292 s |  60.216 s |
| TestStreamed                 | .NET 8.0 | 24.077 s |  24.7536 s |  16.3730 s | 17.231 s | 13.705 s | 14.892 s | 23.669 s |  66.932 s |
| TestBuffered                 | .NET 8.0 |  6.170 s |   3.1043 s |   2.0533 s |  6.386 s |  4.214 s |  4.306 s |  6.522 s |  10.631 s |
| TestStreamed_500Threads      | .NET 8.0 |  6.745 s |   0.5722 s |   0.3785 s |  6.672 s |  6.372 s |  6.638 s |  6.715 s |   7.766 s |
| TestBuffered_500Threads      | .NET 8.0 |  2.617 s |   0.0682 s |   0.0451 s |  2.612 s |  2.561 s |  2.591 s |  2.634 s |   2.701 s |
| TestAsyncStreamed            | .NET 8.0 |  6.641 s |   0.2564 s |   0.1696 s |  6.711 s |  6.381 s |  6.503 s |  6.732 s |   6.868 s |
| TestAsyncBuffered            | .NET 8.0 |  2.641 s |   0.0627 s |   0.0415 s |  2.624 s |  2.613 s |  2.617 s |  2.645 s |   2.747 s |
| TestAsyncStreamed_500Threads | .NET 8.0 |  6.677 s |   0.2081 s |   0.1377 s |  6.677 s |  6.443 s |  6.637 s |  6.762 s |   6.847 s |
| TestAsyncBuffered_500Threads | .NET 8.0 |  2.570 s |   0.2778 s |   0.1837 s |  2.628 s |  2.247 s |  2.575 s |  2.635 s |   2.821 s |

Framework:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3447/23H2/2023Update/SunValley3)
12th Gen Intel Core i9-12900K, 1 CPU, 24 logical and 16 physical cores
  [Host]        : .NET Framework 4.8.1 (4.8.9232.0), X86 LegacyJIT
  MonitoringJob : .NET Framework 4.8.1 (4.8.9232.0), X86 LegacyJIT

Job=MonitoringJob  IterationCount=10  RunStrategy=Monitoring

| Method                       | Mean     | Error     | StdDev   | Min      | Q1       | Q3       | Max      |
|----------------------------- |---------:|----------:|---------:|---------:|---------:|---------:|---------:|
| TestStreamed                 |  4.856 s |  1.0514 s | 0.6954 s |  3.824 s |  4.516 s |  4.978 s |  6.539 s |
| TestBuffered                 | 21.196 s | 13.3446 s | 8.8266 s | 13.343 s | 13.469 s | 26.121 s | 38.111 s |
| TestStreamed_500Threads      |  5.002 s |  0.3371 s | 0.2230 s |  4.760 s |  4.855 s |  5.069 s |  5.546 s |
| TestBuffered_500Threads      |  3.060 s |  0.4814 s | 0.3184 s |  2.850 s |  2.909 s |  2.982 s |  3.899 s |
| TestAsyncStreamed            |  5.431 s |  0.8231 s | 0.5444 s |  4.894 s |  4.992 s |  5.647 s |  6.542 s |
| TestAsyncStreamed_500Threads |  6.183 s |  1.3688 s | 0.9054 s |  4.943 s |  5.435 s |  7.083 s |  7.396 s |

Based on those results and some debugging, I have an idea of the cause and a couple recommendations.

Recommendations:

If at all possible, update the contract to offer Async methods--this should be as simple as creating a clone of the ServiceContract, changing the methods to return Task<T>, sticking Async on the end of the method and interface name, but keeping the same ServiceContract.Name value (guide here, or example in my reproduction case here).

If that's not practical, see if you can increase the minimum thread pool size to be slightly larger than the number of requests you're making.

If you can, maybe switching to TransferMode.Buffered instead of TransferMode.Streamed (this requires the server and all clients to switch as well).

Theory as to cause

Disclaimers: I'm not an expert on .NET tasks, threading, and async, and I haven't yet tested this theory by building a different implementation of CommunicationObject.Open

I think the synchronous CommunicationObject.Open implementation in .NET Core is written in a way that blocks the current thread until a second thread is available, putting pressure on the Thread Pool. When the number of simultaneous requests exceeds the size of the Thread Pool, the client grinds to a halt until the Thread Pool expands to a point where it has more threads than it has stalled requests. The Thread Pool only adds threads every 0.5 seconds (only source I've found), so with enough requests queued this can take some time.

When an operation is called on the client object for the first time, WCF needs to open the channel. If the operation is synchronous, the channel opening will also be synchronous. This eventually takes the call down to the System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan) method, which eventually calls WaitForCompletion() which uses .GetResult(). So this thread will be blocked, which is expected because we're making a synchronous call.

Critically, however, it also calls await TaskHelpers.EnsureDefaultTaskScheduler(). This does nothing but return a custom awaiter, DefaultTaskSchedulerAwaiter. The comments there are very helpful--basically it wants to force any code after this point to only use the default synchronization context, like ConfigureAwait(false) but with less copy/paste. The key thing is that it checks to see if you're on a custom context (we are; ServiceModelSynchronizationContext), and it fixes that by throwing the continuation to the Thread Pool. Which is full of blocked threads, because all the available threads were grabbed by the sudden burst of requests. So all the threads handling the requests are blocked waiting on the results of the task continuation, which is waiting on nothing except an available thread to continue on. So it sits there until the thread pool expands to a point that one of those continuations can get scheduled, and then suddenly everything unjams.

You can see the resulting behavior in this graph of Thread Pool counters during a run: image

The Queue length starts out very high because my test client starts with a huge number of requests, while ThreadPool Thread Count starts low and gradually increases until it reaches the same length as the Queue. Completed Work Items stays at nearly nothing until the Thread Count crosses the Queue Length, and then suddenly climbs as all those queued items start finishing.

Switching to async methods for the client solves the whole problem because the threads are just yielded back to the pool and are then immediately available to run the continuations.

mconnew commented 5 months ago

@AngleOSaxon, you are 100% correct in what you stated in your analysis. I had a look at your sample repo and I think there's another change you can do to improve things. When using CoreWCF, the default when you only specify a port number for UseNetTcp is to listen specifying IPAddress.Any which is IPv4 only. When using a hostname (eg localhost) the client will attempt to connect using IPv6 first, then when that times out, try IPv4. If you change the server to UseNetTcp(IPAddress.IPv6Any, 3001), then it will listen on IPv4 and IPv6 and the connections will happen a little faster.