dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.19k stars 4.72k forks source link

System.DirectoryServices.Protocols - Linux -> binding time #38610

Closed GrowSharp closed 4 years ago

GrowSharp commented 4 years ago

Description

So problem is, that sometimes (really randomly) the bind takes 127 seconds to complete. It doesn't matter if it's anonymous bind or not. Just sometimes the .Bind(); takes roughly 127 seconds. Did some logging around it for you to see, see Data.

Configuration

Domain joined CentOS 7.8.2003 also Ubuntu 20.04 .NET Core 3.1.301, commit: 7feb845744 .Protocols version 5.0.0-preview.5.20278.1

Regression?

On Windows it does not happen.

Data

Time to .Bind(); the connection:

Elapsed time 6 ms
Elapsed time 7 ms
Elapsed time 6 ms
Elapsed time 8 ms
Elapsed time 5 ms
Elapsed time 127317 ms
Elapsed time 127183 ms
Elapsed time 5 ms
Elapsed time 127180 ms
Elapsed time 9 ms
Elapsed time 4 ms
Elapsed time 127299 ms
ericstj commented 4 years ago

If you can repro this, it might be good to get a dump to find out what's happening. It's possible this is a server side delay too. Maybe create an app that repros it and when you see the bind taking longer than 100 ms call dotnet-dump on your own proecess: https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump

GrowSharp commented 4 years ago

Finally found time to do this. @ericstj It took some time but I manage to reproduce it (it seems to be happening pretty random). Now I have +- 250 MB dump file (Heap) and one smaller like 70 MB (Mini). I will now upload only the smaller file, let me know if you will need the bigger one.

File: core_20200721_141538.zip

danmoseley commented 4 years ago

dotnet trace collect might be more helpful - if you can get one for a good case, and for a bad case. https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#dotnet-trace-collect

GrowSharp commented 4 years ago

All right. What I did while getting an example of a bad case scenario (bad.nettrace): Right before my .Bind() statement I placed Process.GetCurrentProcess().Id; to obtain the process ID. At .Bind(); statement I also placed breakpoint. I started the program, called the API that went into the desired block of code, read the process ID and executed the dotnet-trace collect command. After that I went for F10 to continue. After .Bind() statement was completed, I pressed F5 and let it run to the end. After that I turned off the dotnet-trace tool and finally terminated my app.

In the good case scenario (good.nettrace) I did almost the same, except the fact that I called the API twice, so at some point the trace should repeat itself, sorry.

Files: nettrace.zip @danmosemsft

danmoseley commented 4 years ago

@GrowSharp could you please check good.nettrace? When I open in PerfView (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#view-the-trace-captured-from-dotnet-trace) I get

Started: Opening good.nettrace
Creating ETLX file C:\Users\danmose\AppData\Local\Temp\PerfView\good_fe97c3c9.etlx from C:\temp\traces\good.nettrace
Collecting a maximum of 20,000,000 events.
[Opening a log file of size 0 MB.]
[ERROR: processing events ****]
***** Consider using /ContinueOnError to ignore the bad part of the trace.  *****
Error: Exception EventPipe conversion: System.Exception: Read past end of stream.
   at FastSerialization.IOStreamStreamReader.Fill(Int32 minimum)
   at FastSerialization.MemoryStreamReader.ReadByte()
   at FastSerialization.Deserializer.ReadObject()
   at Microsoft.Diagnostics.Tracing.EventPipeEventSource.Process()
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CopyRawEvents(TraceEventDispatcher rawEvents, IStreamWriter writer)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.<>c__DisplayClass118_0.<FastSerialization.IFastSerializable.ToStream>b__0()
   at FastSerialization.DeferedRegion.Write(Serializer serializer, Action toStream)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.FastSerialization.IFastSerializable.ToStream(Serializer serializer)
   at FastSerialization.Serializer.WriteObjectData(IFastSerializable obj, Tags beginTag)
   at FastSerialization.Serializer.WriteObjectRef(IFastSerializable obj, Boolean defered)
   at FastSerialization.Serializer..ctor(IStreamWriter writer, IFastSerializable entryObject)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeEventSources(TraceEventDispatcher source, String etlxFilePath, TraceLogOptions options)
   at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeDataFile(String filePath, String etlxFilePath, TraceLogOptions options)
   at PerfView.EventPipePerfViewData.GetTraceLog(TextWriter log, Action`3 onLostEvents)
[Error: exception while opening EventPipe data.]
Completed: Opening good.nettrace   (Elapsed Time: 0.816 sec)

bad.nettrace opens OK and I"ll see if anything jumps out (but this is @joperezr area of expertise)

danmoseley commented 4 years ago

From bad.nettrace, the expensive path just ends in native code unfortunately:

-- ||     |                                        + module System.Private.CoreLib <<System.Private.CoreLib!AsyncTaskMethodBuilder.Start>> ||     |                                         + module Adler.Infrastructure.oAuth.WebApi <<Adler.Infrastructure.oAuth.WebApi!Adler.Infrastructure.oAuth.WebApi.Base.RouterMiddleware+<>c__DisplayClass1_0+<b0>d.MoveNext()>> ||     |                                          + module Adler.Infrastructure.oAuth.BL <<Adler.Infrastructure.oAuth.BL!ActiveDirectoryRoleService.GetClaimsAsync>> ||     |                                          |+ module System.Private.CoreLib <<System.Private.CoreLib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Canon].Start(!!0&)>> ||     |                                          | + module Adler.Infrastructure.oAuth.BL <<Adler.Infrastructure.oAuth.BL!Adler.Infrastructure.oAuth.BL.Services.ActiveDirectoryRoleService+d__8.MoveNext()>> ||     |                                          |  + module System.DirectoryServices.Protocols <<System.DirectoryServices.Protocols!LdapConnection.Bind>> ||     |                                          |  |+ Type: External ||     |                                          |  | + Event Microsoft-DotNETCore-SampleProfiler/Thread/Sample

@noahfalk what is the best way to gather a trace that includes native frames here? I guess not dotnet trace?

noahfalk commented 4 years ago

correct dotnet trace doesn't have support for native code (not yet at least). On Linux you can use the perfcollect script here that runs a combination of perf (for native CPU sampled callstacks) + lttng (for runtime and EventSource events)

danmoseley commented 4 years ago

@GrowSharp do you want to give that a try?

Although, as they say, when a perf issue is egregious enough, you don't need a profiler, you need a debugger. Break in once or twice during the delay and inspect where the CPU is. Is that something you could try? I believe you need to use lldb. The closest I could find to instructions are here which assume you're debugging the CLR itself, but it should be much the same. @joperezr do you have more info? I assume @GrowSharp will need some -dev packages?

GrowSharp commented 4 years ago

I was trying to get it work for a few hours yesterday, but unfortunately without success.. On the other side, I started reading The Book of the Runtime and discovered that I'm finally confident enough to start learning in depth how it all goes together. I always wanted to be able to investigate these kind of issues on my own and in the process maybe become contributor. But I'm having hard time finding any good resources.. The link @danmosemsft sent me that describes how to debug CLR in lldb is not very detailed (I'm having problems with first steps). I also can't find any resources on how to run CoreFX locally so I can debug it, fix things ex cetera. My current goal is to learn about CoreFX, interop etc. I'm not planning on diving into stuff like Roslyn, CoreCLR, CoreRT right now, that's probably still a distant future for me. So if you guys could please point me into the right direction, or maybe provide some more resources I would be very, very grateful. Also, I haven't written any article / tutorial maybe since I left the MSP program, so I could maybe also write some detailed tutorial for future people like me afterwards. Thanks in advance!

joperezr commented 4 years ago

Pushing to 6.0.0 for now as this is not required for 5.0.0 but we should finish up the investigation here.

GrowSharp commented 4 years ago

As I said here. I think that the problem will be on our side, in our messy 12 years old AD system. I think we can close this one also.

Thank you very much for trying to help me everyone!