dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

GetOrAllocateThreadPoolBoundHandle is too slow in certain use case #108363

Open ElectricVampire opened 6 days ago

ElectricVampire commented 6 days ago

Description

UseCase I am trying to add oracle support in query plan visualizer visual studio extension. The issue I am facing is that connection.Open() is taking 7 seconds which is timing out the debug visualizer. Exact same connection.Open take milliseconds when done in without visualizer and I am able to reproduce this all the time.

In following image(Oracle traces showing difference between fast and slow use case) you can see for slower use case ConnectIterate takes 7 seconds which seems to be simple BeginConnect call on TcpClient image

I also enabled .Net Diagnostics events which shows where those missing 7 seconds went, between slow and fast .net traces, I can see in slower case GetOrAllocateThreadPoolBoundHandle took 7 Seconds which I think went for ThreadPoolBoundHandle.BindHandle(this) but why its not executed in faster use case and why its executed in slower use case.

03:45:03.1876310[ConnectStart] address: InterNetwork:16:{5,241,127,0,0,1,0,0,0,0,0,0,0,0}
03:45:03.1880221[Info] thisOrContextObject: SafeSocketHandle#14421545, memberName: GetOrAllocateThreadPoolBoundHandle, message: calling ThreadPool.BindHandle()
03:45:10.7995167[Connected] localEp: IPEndPoint#922213090, remoteEp: (null), socketHash: 33163964

Problem: When connection.Open happens from VS debugger it times out because it has 5 second limit and for every connection.Open it takes 7 seconds because of GetOrAllocateThreadPoolBoundHandle. Need a solution by which this can be reduced.

Reproduction Steps

We should be able to reproduced it all the time. Just doing connection.Open inside your code is very past but done [via visualizer](connection.Open()), it's very slow even when its the same process. I have attached .net trace and oracle trace. Oracle trace can provide you details on thread level information.

fast_net.txt fast_oracle.txt slow_net.txt slow_oracle.txt

Sample App: Replace your oracle connection string. Now here query returns immediately. EfCoreTesting.zip

Install vsix after building https://github.com/Giorgi/EFCore.Visualizer or just install from following zip file EfCoreVisualizerExtension.zip

After installing if you inspect query in visualizer it will timeout image

Expected behavior

GetOrAllocateThreadPoolBoundHandle should return immediately as it did in fast use case

Actual behavior

GetOrAllocateThreadPoolBoundHandle takes 7 seconds

Regression?

No response

Known Workarounds

Nope

Configuration

No response

Other information

No response

MihaZupan commented 6 days ago

03:45:03.1876310[ConnectStart] 03:45:03.1880221[Info] message: calling ThreadPool.BindHandle() 03:45:10.7995167[Connected]

I can see in slower case GetOrAllocateThreadPoolBoundHandle took 7 Seconds

That's not what those logs are implying. These are merely timestamps at different places in the code, so all you can glean from them is that you've reached this line of code at some point: https://github.com/dotnet/runtime/blob/ba9b3ba83603d1f54c00c45a87344568d4277966/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.Windows.cs#L49 and then 7 seconds later you got a connection, not that that specific call took 7 seconds.

You should be able to see how long the connection attempt took outside of .NET itself if you collect a network capture (e.g. Wireshark).

ElectricVampire commented 5 days ago

03:45:03.1876310[ConnectStart] 03:45:03.1880221[Info] message: calling ThreadPool.BindHandle() 03:45:10.7995167[Connected]

I can see in slower case GetOrAllocateThreadPoolBoundHandle took 7 Seconds

That's not what those logs are implying. These are merely timestamps at different places in the code, so all you can glean from them is that you've reached this line of code at some point:

https://github.com/dotnet/runtime/blob/ba9b3ba83603d1f54c00c45a87344568d4277966/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.Windows.cs#L49

and then 7 seconds later you got a connection, not that that specific call took 7 seconds. You should be able to see how long the connection attempt took outside of .NET itself if you collect a network capture (e.g. Wireshark).

@MihaZupan Thanks for guidance. I took the traces between slow and fast use case. fast.pcapng.txt slow.pcapng.txt

I compared both cases and couldn't relate it to C# code on why one is always slow and another one is always fast. @MihaZupan Will really appreciate your help here image

wfurt commented 3 days ago

so the second capture has RST as the fist packet. It seems like there was some previous connection? And do you connect by name or IP address? It may be worth of trying to create small repro e.g. remove all the unnecessary layers to simplify the investigation.