Closed GrowSharp closed 4 years ago
If you can repro this, it might be good to get a dump to find out what's happening. It's possible this is a server side delay too. Maybe create an app that repros it and when you see the bind taking longer than 100 ms call dotnet-dump on your own proecess: https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump
Finally found time to do this. @ericstj It took some time but I manage to reproduce it (it seems to be happening pretty random). Now I have +- 250 MB dump file (Heap) and one smaller like 70 MB (Mini). I will now upload only the smaller file, let me know if you will need the bigger one.
File: core_20200721_141538.zip
dotnet trace collect
might be more helpful - if you can get one for a good case, and for a bad case. https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#dotnet-trace-collect
All right. What I did while getting an example of a bad case scenario (bad.nettrace):
Right before my .Bind()
statement I placed Process.GetCurrentProcess().Id;
to obtain the process ID.
At .Bind();
statement I also placed breakpoint.
I started the program, called the API that went into the desired block of code, read the process ID and executed the dotnet-trace collect
command. After that I went for F10 to continue. After .Bind()
statement was completed, I pressed F5 and let it run to the end. After that I turned off the dotnet-trace
tool and finally terminated my app.
In the good case scenario (good.nettrace) I did almost the same, except the fact that I called the API twice, so at some point the trace should repeat itself, sorry.
Files: nettrace.zip @danmosemsft
@GrowSharp could you please check good.nettrace? When I open in PerfView (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#view-the-trace-captured-from-dotnet-trace) I get
Started: Opening good.nettrace
Creating ETLX file C:\Users\danmose\AppData\Local\Temp\PerfView\good_fe97c3c9.etlx from C:\temp\traces\good.nettrace
Collecting a maximum of 20,000,000 events.
[Opening a log file of size 0 MB.]
[ERROR: processing events ****]
***** Consider using /ContinueOnError to ignore the bad part of the trace. *****
Error: Exception EventPipe conversion: System.Exception: Read past end of stream.
at FastSerialization.IOStreamStreamReader.Fill(Int32 minimum)
at FastSerialization.MemoryStreamReader.ReadByte()
at FastSerialization.Deserializer.ReadObject()
at Microsoft.Diagnostics.Tracing.EventPipeEventSource.Process()
at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CopyRawEvents(TraceEventDispatcher rawEvents, IStreamWriter writer)
at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.<>c__DisplayClass118_0.<FastSerialization.IFastSerializable.ToStream>b__0()
at FastSerialization.DeferedRegion.Write(Serializer serializer, Action toStream)
at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.FastSerialization.IFastSerializable.ToStream(Serializer serializer)
at FastSerialization.Serializer.WriteObjectData(IFastSerializable obj, Tags beginTag)
at FastSerialization.Serializer.WriteObjectRef(IFastSerializable obj, Boolean defered)
at FastSerialization.Serializer..ctor(IStreamWriter writer, IFastSerializable entryObject)
at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeEventSources(TraceEventDispatcher source, String etlxFilePath, TraceLogOptions options)
at Microsoft.Diagnostics.Tracing.Etlx.TraceLog.CreateFromEventPipeDataFile(String filePath, String etlxFilePath, TraceLogOptions options)
at PerfView.EventPipePerfViewData.GetTraceLog(TextWriter log, Action`3 onLostEvents)
[Error: exception while opening EventPipe data.]
Completed: Opening good.nettrace (Elapsed Time: 0.816 sec)
bad.nettrace opens OK and I"ll see if anything jumps out (but this is @joperezr area of expertise)
From bad.nettrace, the expensive path just ends in native code unfortunately:
--
|| | + module System.Private.CoreLib <<System.Private.CoreLib!AsyncTaskMethodBuilder.Start>>
|| | + module Adler.Infrastructure.oAuth.WebApi <<Adler.Infrastructure.oAuth.WebApi!Adler.Infrastructure.oAuth.WebApi.Base.RouterMiddleware+<>c__DisplayClass1_0+<
@noahfalk what is the best way to gather a trace that includes native frames here? I guess not dotnet trace
?
correct dotnet trace doesn't have support for native code (not yet at least). On Linux you can use the perfcollect script here that runs a combination of perf (for native CPU sampled callstacks) + lttng (for runtime and EventSource events)
@GrowSharp do you want to give that a try?
Although, as they say, when a perf issue is egregious enough, you don't need a profiler, you need a debugger. Break in once or twice during the delay and inspect where the CPU is. Is that something you could try? I believe you need to use lldb. The closest I could find to instructions are here which assume you're debugging the CLR itself, but it should be much the same. @joperezr do you have more info? I assume @GrowSharp will need some -dev packages?
I was trying to get it work for a few hours yesterday, but unfortunately without success..
On the other side, I started reading The Book of the Runtime
and discovered that I'm finally confident enough to start learning in depth how it all goes together.
I always wanted to be able to investigate these kind of issues on my own and in the process maybe become contributor. But I'm having hard time finding any good resources.. The link @danmosemsft sent me that describes how to debug CLR in lldb is not very detailed (I'm having problems with first steps).
I also can't find any resources on how to run CoreFX locally so I can debug it, fix things ex cetera.
My current goal is to learn about CoreFX, interop etc. I'm not planning on diving into stuff like Roslyn, CoreCLR, CoreRT right now, that's probably still a distant future for me.
So if you guys could please point me into the right direction, or maybe provide some more resources I would be very, very grateful. Also, I haven't written any article / tutorial maybe since I left the MSP program, so I could maybe also write some detailed tutorial for future people like me afterwards.
Thanks in advance!
Pushing to 6.0.0 for now as this is not required for 5.0.0 but we should finish up the investigation here.
Description
So problem is, that sometimes (really randomly) the bind takes 127 seconds to complete. It doesn't matter if it's anonymous bind or not. Just sometimes the .Bind(); takes roughly 127 seconds. Did some logging around it for you to see, see Data.
Configuration
Domain joined CentOS 7.8.2003 also Ubuntu 20.04 .NET Core 3.1.301, commit: 7feb845744 .Protocols version 5.0.0-preview.5.20278.1
Regression?
On Windows it does not happen.
Data
Time to
.Bind();
the connection: