Maoni0 / realmon

A monitoring tool that tells you when GCs happen in a process and some characteristics about these GCs
MIT License
281 stars 25 forks source link

Added the ability to request call stacks for induced GCs and large object allocations #29

Closed MokoSan closed 2 years ago

MokoSan commented 2 years ago

Call stacks are now available for induced GCs and large object allocations by passing in the -s flag. These call stacks are currently only available in Windows and require admin privileges to be run. This is an effort to close #8

An example of the call stacks are: image

To ensure the successful resolution of the symbols, ensure that the _NT_SYMBOL_PATH is set correctly. While testing, I had it set to:

;SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols;SRV*C:\Symbols*https://nuget.smbsrc.net;SRV*C:\Symbols*https://referencesource.microsoft.com/symbols

Note:

Follow Ups to this PR include:

MokoSan commented 2 years ago

Working on cleaning up the formatting. Do let me know if the following format looks better than the one above:

image

Maoni0 commented 2 years ago

thanks! is it the case that this only resolves the calls from native code? it would make it much more useful if we can resolve symbols from managed code since the native portions usually the same. for formatting I think the 1st one might be better since the callstack could get very deep.

MokoSan commented 2 years ago

Thanks! Kept the previous format.

is it the case that this only resolves the calls from native code?

I should have named the method better; both the Native and Managed symbols are being resolved. Example:

image

Maoni0 commented 2 years ago

I see. do you know why the 7 frames below `JIT_NewArr1 aren't resolved?

MokoSan commented 2 years ago

Been staring at the code for a while and it seems like for cases where the address isn't resolved, the ModuleFile associated with the code address are null i.e. not available from the call stacks from the event itself. Doesn't seem like it's related to the SymbolReader, the abstraction responsible for resolving the methods as if I remove the resolution logic altogether, I am still able to resolve at least the module name.

Taking a look at an official example, it seems like there is no resolve for if the ModuleFile is null. My guess, just on the basis of reference to the rest of the call stack, is that of the Trace Event library.

MokoSan commented 2 years ago

I think I fixed the issue with a majority of the unresolved addressed!! :)

For example, comparing the first image in the PR to the one below, we are successfully resolving the one frame that seemed unresolved:

Before Fix image

After Fix image

The fix was to bump up the verbosity of the rundown events to force the resolution of the already compiled managed code.

Maoni0 commented 2 years ago

Cool! I also looked around in the perfview code base and saw that this provider is enabled with verbose level in all the places I looked at. the only 2 events that are enabled for the runtime provide on verbose level are MethodDCStartILToNativeMap/MethodDCEndILToNativeMap. and judging by the fact that the missing frames are always right before the native helper call (like coreclr!JIT_NewArr1) I guess those events are needed in order to find their caller.

with this are you able to decode the one with 7 frames right before the JIT_NewArr1 call?

MokoSan commented 2 years ago

.. with this are you able to decode the one with 7 frames right before the JIT_NewArr1 call?

Thanks for the suggestion!! This worked brilliantly and now we are resolving a lot more frames - specifically, I added more pertinent keywords and experimented with new ones the symbols were resolved. For example, for a similar callstack where we were missing 7 frames before the JIT_NewArr1 call, we are now resolving all the missing symbols:

...
coreclr!WKS::GCHeap::AllocLHeap
coreclr!AllocLHeap
coreclr!JIT_NewArr1
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceMethods.get_Item(value class Microsoft.Diagnostics.Tracing.Etlx.MethodIndex)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceCodeAddress.get_Method()
gcrealtimemon!realmon.CallStackResolution.CallStackManager.ResolveSymbols(class Microsoft.Diagnostics.Tracing.Etlx.TraceCallStack)
gcrealtimemon!realmon.CallStackResolution.CallStackManager.PrintCallStack(class Microsoft.Diagnostics.Tracing.TraceEvent,class realmon.Configuration.Configuration)
gcrealtimemon!realmon.CallStackResolution.CallStackManager+<>c__DisplayClass1_0.<InitializeAndRegisterCallStacks>b__2(class Microsoft.Diagnostics.Tracing.Parsers.Clr.GCAllocationTickTraceData)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.TraceEventDispatcher.DoDispatch(class Microsoft.Diagnostics.Tracing.TraceEvent)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceLog.DispatchClonedEvent(class Microsoft.Diagnostics.Tracing.TraceEvent)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceLog.FlushRealTimeEvents(class System.Object)
system.private.corelib!System.Threading.TimerQueueTimer+<>c.<.cctor>b__23_0(System.Object)
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
system.private.corelib!System.Threading.TimerQueueTimer.CallCallback(Boolean)
system.private.corelib!System.Threading.TimerQueueTimer.Fire(Boolean)
system.private.corelib!System.Threading.TimerQueue.FireNextTimers()
system.private.corelib!System.Threading.TimerQueue.AppDomainTimerCallback(Int32)
coreclr!CallDescrWorkerInternal
coreclr!MethodDescCallSite::CallTargetWorker
coreclr!AppDomainTimerCallback_Worker
coreclr!ManagedThreadBase_DispatchMiddle
coreclr!ManagedThreadBase_DispatchOuter
coreclr!AppDomainTimerCallback
coreclr!ThreadpoolMgr::AsyncTimerCallbackCompletion
coreclr!UnManagedPerAppDomainTPCount::DispatchWorkItem
coreclr!ThreadpoolMgr::WorkerThreadStart
coreclr!Thread::intermediateThreadProc
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

Of all the many examples I have seen, using PerfView to get to an answer, we are still missing the resolution for these specific 3 frames:

An example of the missing call stack is below; I might have tried all the ClrTraceEventParser.Keywords but still can't get these 3 to resolve. What's odd is that all the other TraceEvent based symbols seemed to be resolved. Any ideas?

coreclr!ETW::SamplingLog::SendStackTrace
coreclr!EtwCallout
coreclr!McTemplateCoU0qqhxpzqp
coreclr!FireEtwGCAllocationTick_V3
coreclr!GCToCLREventSink::FireGCAllocationTick_V3
coreclr!??WKS::gc_heap::try_allocate_more_space
coreclr!WKS::gc_heap::allocate_large_object
coreclr!WKS::GCHeap::AllocLHeap
coreclr!AllocLHeap
coreclr!JIT_NewArr1
System.Private.CoreLib.il!System.Array.Resize(!!0[]&,int32)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceCallStacks.InternCallStackIndex(value class Microsoft.Diagnostics.Tracing.Etlx.CodeAddressIndex,value class Microsoft.Diagnostics.Tracing.Etlx.CallStackIndex)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceCallStacks.GetStackIndexForStackEvent64(unsigned int64*,int32,class Microsoft.Diagnostics.Tracing.Etlx.TraceProcess,value class Microsoft.Diagnostics.Tracing.Etlx.CallStackIndex)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0.<SetupCallbacks>b__28(class Microsoft.Diagnostics.Tracing.Parsers.Clr.ClrStackWalkTraceData)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.TraceEventDispatcher.DoDispatch(class Microsoft.Diagnostics.Tracing.TraceEvent)
Microsoft.Diagnostics.Tracing.TraceEvent!Microsoft.Diagnostics.Tracing.ETWTraceEventSource.RawDispatch(value class EVENT_RECORD*)
system.console.il!dynamicClass.IL_STUB_ReversePInvoke(int64)
coreclr!UMThunkStub
sechost!EtwpLoadEventTrigger
sechost!EtwpProcessRealTimeTraces
sechost!ProcessTrace
0x7ff7dd52dce8
0x7ff7dd52dba9
0x7ff7dd52da5d
gcrealtimemon!realmon.Program.RealTimeProcessing(int32,class Options,class realmon.Configuration.Configuration)
gcrealtimemon!realmon.Program+<>c__DisplayClass11_1+<<Main>b__0>d.MoveNext()
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].ExecutionContextCallback(System.Object)
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].MoveNext(System.Threading.Thread)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].MoveNext()
system.private.corelib!System.Runtime.CompilerServices.TaskAwaiter+<>c.<OutputWaitEtwEvents>b__12_0(System.Action, System.Threading.Tasks.Task)
system.private.corelib!System.Runtime.CompilerServices.AsyncMethodBuilderCore+ContinuationWrapper.Invoke()
system.private.corelib!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean)
system.private.corelib!System.Threading.Tasks.Task.RunContinuations(System.Object)
system.private.corelib!System.Threading.Tasks.Task.FinishContinuations()
system.private.corelib!System.Threading.Tasks.Task`1[System.__Canon].TrySetResult(System.__Canon)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.__Canon].SetResult(System.__Canon)
gcrealtimemon!realmon.Program+<GetConfiguration>d__9.MoveNext()
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].ExecutionContextCallback(System.Object)
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].MoveNext(System.Threading.Thread)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].MoveNext()
system.private.corelib!System.Runtime.CompilerServices.TaskAwaiter+<>c.<OutputWaitEtwEvents>b__12_0(System.Action, System.Threading.Tasks.Task)
system.private.corelib!System.Runtime.CompilerServices.AsyncMethodBuilderCore+ContinuationWrapper.Invoke()
system.private.corelib!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean)
system.private.corelib!System.Threading.Tasks.Task.RunContinuations(System.Object)
system.private.corelib!System.Threading.Tasks.Task.FinishContinuations()
system.private.corelib!System.Threading.Tasks.Task`1[System.__Canon].TrySetResult(System.__Canon)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.__Canon].SetResult(System.__Canon)
gcrealtimemon!realmon.Configuration.ConfigurationReader+<ReadConfigurationAsync>d__2.MoveNext()
Maoni0 commented 2 years ago

I wonder if this has something to do with when the rundown events were fired for these methods? @brianrob would know but he's on vacation too 😀

MokoSan commented 2 years ago

Cool, thanks! The good news is that this is the only callstack that seems to be with unresolved addresses; been extensively testing this with as many processes as I can launch and they all have been successfully returning fully resolved call stacks.

MokoSan commented 2 years ago

Hi @brianrob! Any idea what might be going on here where certain symbols aren't getting resolved:

To enable printing the call stack for GCTriggered and GCAllocationTick events, I have the following providers + keywords enabled following this example. All but the following 3 frames are getting properly resolved:

Could this be an issue with the rundown? The full file with this impl can be found in the CallStackManager.cs file of this PR.

     // Setup the additional providers needed for the enabling of the callstacks.
            // Image Load and Process needed for Native Symbol Resolution.
            // Requires Admin privileges, else this will throw.
            traceEventSession.EnableKernelProvider(
                flags:
             KernelTraceEventParser.Keywords.ImageLoad |
             KernelTraceEventParser.Keywords.Process,
                stackCapture:
             KernelTraceEventParser.Keywords.None
             );

            // Needed for the GC events + call stacks.
            traceEventSession.EnableProvider(
                ClrTraceEventParser.ProviderGuid,
                TraceEventLevel.Verbose, // Verbose needed for call stacks.
                (ulong)(ClrTraceEventParser.Keywords.GC                |
                ClrTraceEventParser.Keywords.Loader                    |
                ClrTraceEventParser.Keywords.JittedMethodILToNativeMap |
                ClrTraceEventParser.Keywords.JITSymbols                |
                ClrTraceEventParser.Keywords.JitTracing                |
                ClrTraceEventParser.Keywords.Jit                       |
                ClrTraceEventParser.Keywords.Codesymbols               |
                ClrTraceEventParser.Keywords.Interop                   |
                ClrTraceEventParser.Keywords.NGen                      |
                ClrTraceEventParser.Keywords.MethodDiagnostic          |
                ClrTraceEventParser.Keywords.Stack));

            // Needed for JIT Compile code that was already compiled. 
            traceEventSession.EnableProvider(ClrRundownTraceEventParser.ProviderGuid, TraceEventLevel.Verbose,
                (ulong)(ClrTraceEventParser.Keywords.Jit               |
                ClrTraceEventParser.Keywords.Loader                    |
                ClrTraceEventParser.Keywords.JittedMethodILToNativeMap |
                ClrTraceEventParser.Keywords.JITSymbols                |
                ClrTraceEventParser.Keywords.Jit                       |
                ClrTraceEventParser.Keywords.JitTracing                |
                ClrTraceEventParser.Keywords.Codesymbols               |
                ClrTraceEventParser.Keywords.Interop                   |
                ClrTraceEventParser.Keywords.GC                        |
                ClrTraceEventParser.Keywords.NGen                      |
                ClrTraceEventParser.Keywords.MethodDiagnostic          |
                ClrTraceEventParser.Keywords.StartEnumeration));
brianrob commented 2 years ago

@MokoSan, what you have listed here seems like it should be fairly exhaustive if not more than you need. The fact that you see some TraceEvent symbols but not others is possible. Are you by chance using ready to run? If so, it's possible that some of these frames are jitted and others are not (e.g. the missing ones are likely to be pre-compiled). Though, I would have expected to see a module name if that's the case. If you're not using ready to run, then my recommendation would be to write some test code that listens for the method rundown events and tries to find the ones that are missing. Specifically, are these methods getting emitted via rundown at all? Also, to confirm, with your same test app, do these symbols resolve properly if you capture a trace instead of using a real-time session?

MokoSan commented 2 years ago

@brianrob, thanks so much for your insight and help! After 2 nights of trying to figure out what was going on I think I was able to get to the issue and now we are resolving all frames. :) :)

The mistake I made was to test out the resolved callstacks of the process itself i.e. realmon trying to get the GC callstacks for the same realmon process, which landed the symbol resolution code into weird territory. This test didn't make sense as in no circumstance would realmon want GC details about the same realmon process.

To test whether we resolve the aforementioned missing resolved frames i.e.

I analyzed a completely independent realmon process that emitted call stacks and the callstacks were successfully resolved:

coreclr!ETW::SamplingLog::SendStackTrace
coreclr!EtwCallout
coreclr!McTemplateCoU0qqhxpzqp
coreclr!FireEtwGCAllocationTick_V3
coreclr!GCToCLREventSink::FireGCAllocationTick_V3
coreclr!??WKS::gc_heap::try_allocate_more_space
coreclr!WKS::gc_heap::allocate_large_object
coreclr!WKS::GCHeap::AllocLHeap
coreclr!AllocLHeap
coreclr!JIT_NewArr1
System.Private.CoreLib.il!System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.JIT.InliningSuccessResult].set_Capacity(int32)
System.Private.CoreLib.il!System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.JIT.InliningSuccessResult].AddWithResize(!0)
microsoft.diagnostics.tracing.traceevent!Microsoft.Diagnostics.Tracing.Analysis.TraceLoadedDotNetRuntime+<>c__DisplayClass32_0.<SetupCallbacks>b__52(class Microsoft.Diagnostics.Tracing.Parsers.Clr.MethodJitInliningSucceededTraceData)
microsoft.diagnostics.tracing.traceevent!Microsoft.Diagnostics.Tracing.TraceEventDispatcher.DoDispatch(class Microsoft.Diagnostics.Tracing.TraceEvent)
microsoft.diagnostics.tracing.traceevent!Microsoft.Diagnostics.Tracing.ETWTraceEventSource.RawDispatch(value class EVENT_RECORD*)
system.console.il!dynamicClass.IL_STUB_ReversePInvoke(int64)
coreclr!UMThunkStub
sechost!EtwpLoadEventTrigger
sechost!EtwpProcessRealTimeTraces
sechost!ProcessTrace
**system.console.il!dynamicClass.IL_STUB_PInvoke(unsigned int64[],unsigned int32,int,int)**
**microsoft.diagnostics.tracing.traceevent!Microsoft.Diagnostics.Tracing.ETWTraceEventSource.ProcessOneFile()**
**microsoft.diagnostics.tracing.traceevent!Microsoft.Diagnostics.Tracing.ETWTraceEventSource.Process()**
gcrealtimemon!realmon.Program.RealTimeProcessing(int32,class Options,class realmon.Configuration.Configuration)
gcrealtimemon!realmon.Program+<>c__DisplayClass11_1+<<Main>b__0>d.MoveNext()
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].ExecutionContextCallback(System.Object)
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].MoveNext(System.Threading.Thread)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,System.__Canon].MoveNext()
system.private.corelib!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean)
system.private.corelib!System.Threading.Tasks.Task.RunContinuations(System.Object)
system.private.corelib!System.Threading.Tasks.Task.FinishContinuations()
system.private.corelib!System.Threading.Tasks.Task`1[System.__Canon].TrySetResult(System.__Canon)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.__Canon].SetResult(System.__Canon)
gcrealtimemon!realmon.Program+<GetConfiguration>d__9.MoveNext()
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].ExecutionContextCallback(System.Object)
system.private.corelib!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].MoveNext(System.Threading.Thread)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,System.__Canon].MoveNext()
system.private.corelib!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean)
system.private.corelib!System.Threading.Tasks.Task.RunContinuations(System.Object)
system.private.corelib!System.Threading.Tasks.Task.FinishContinuations()
system.private.corelib!System.Threading.Tasks.Task`1[System.__Canon].TrySetResult(System.__Canon)
system.private.corelib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.__Canon].SetResult(System.__Canon)
gcrealtimemon!realmon.Configuration.ConfigurationReader+<ReadConfigurationAsync>d__2.MoveNext()

After I realized this, there were still a few unresolved frames that were fixed by including the ClrTraceEventParser.Keywords.GCAllObjectAllocation keyword.

For the sake of completeness for future issues, the answer to all your questions are the following:

Are you by chance using ready to run?

We are not. (No <PublishReadyToRun> in the csproj)

Are these methods getting emitted via rundown at all?

Wrote up a test app that confirmed that the Module + Methods were all emitted via rundown.

do these symbols resolve properly if you capture a trace instead of using a real-time session?

Yes, had no issue with resolving symbols while opening up the trace in Perf View from the trace itself.

brianrob commented 2 years ago

I'm not sure why monitoring the current process would present an issue when resolving JIT symbols, but I'm glad that you got things working nonetheless. Thanks for also answering my questions. GCAllObjectAllocation shouldn't have any impact on symbol resolution, so that's weird. It just logs each allocation in its own event, and then also triggers BulkType events to be logged as long as they are enabled, so that consumers can resolve the type of the allocation to a string type name.

MokoSan commented 2 years ago

@brianrob, thanks for the review! I tried a bunch of times to replicate the condition that led to me believe that GCAllObjectAllocation was required and unfortunately wasn't able to do so. While testing I tried as many combinations of the keywords as I could so see what stuck and it could have been that I missed a couple of essential ones. Either way, I gave this another shot and seems like all the frames (tested on 5 separate processes ranging from devenv to a console app that allocates on the LOH on a timer) seem to be resolved.

Apologies for the confusion.

brianrob commented 2 years ago

No worries at all!

MokoSan commented 2 years ago

@Maoni0, this PR might finally be ready for the merge 😄! I have tested this out with and without the use of plaintext on a variety of processes including a sample one that allocs only on the LOH and devenv.

Thanks, @ryandle! Incorporating my changes with yours were a breeze! Your feedback would be appreciated, if you have any.

Using Plain Text

CallStacksWithPlainText

Using Spectre

CallStacksWithSpectre

As a refresher, we are generating callstacks for:

  1. Induced GCs
  2. Large Allocs

These can be activated by the -s flag and are currently only available for Windows if the session is started with Admin privileges. Please let me know if there is any feedback!

Maoni0 commented 2 years ago

LGTM - please resolve conflicts and we can merge :)

MokoSan commented 2 years ago

Thanks for taking a look! All done.