Open valco1994 opened 4 years ago
@sywhang @josalem cc: @dotnet/dotnet-diag
It was declared by .NET team (details can be found in several places, e.g. here), that LTTng had been chosen as a major tool for performance analysis on Linux.
Yes, and this is still true :)
By the way there is a DotNETRuntime:CLRStackWalk event in CoreCLR, which provides managed callstacks and, as far as I found, unexpectedly wasn't being emitted on Linux.
The way managed callstacks get resolved in LTTng is different from the way they get resolved on EventPipe or Windows. To resolve callstacks in LTTng we use perf to get the stack (which includes both native and managed callstack). If I recall correctly I believe perf uses libunwind to get the callstack for each tracepoints. On top of this, since the OS doesn't know how to resolve jitted (managed) callstacks, the runtime emits a file that maps IPs to symbols for jitted code. This gets zipped into .trace.zip file that you see when you use perfcollect and PerfView is able to decode them into managed callstack.
At the same time new .NET Core subsystem for performance analysis - EventPipe - successfully provides callstacks on Linux.
Yes, but only for managed. In fact, for callstack resolutions LTTng is ahead of EventPipe in the sense that it can provide both native and managed callstack. EventPipe can only understand managed callstack so when users want native callstack we point them to LTTng.
reuse existing cross-platform code for stackwalking to generate CLRStackWalk event with LTTng
As explained above, this event isn't necessary to get managed code.
unify and put in the one place code related to stackwalking
The runtime has many components that use stackwalking. In diagnostics space alone the profiler APIs and SampleProfiler which is used by EventPipe to get managed callstack both rely on stackwalking code. Both of them use the code you found (stalkwalk.cpp).
I had read about perf
and perfcollect
before writing this issue and, as far as I understand, they do not satisfy my requirements. It is important for me to have a callstack corresponding to the event precisely. But perf
can only sample callstacks with a specified frequency or provide them for its events. And it doesn't know about LTTng events emitted by CoreCLR at all.
Is it right? If there is a way to establish a correspondence between LTTng events and native callstacks collected by perf, it would be wonderful.
Now about DotNETRuntime:CLRStackWalk
. It’s the fact that in ETW it’s produced for every event which logically has associated stack. I mean this event: https://docs.microsoft.com/en-us/dotnet/framework/performance/stack-etw-event. And in such a case, I can establish a correspondence between other events and their callstacks. Furthermore, the absence of this event on Linux breaks promises about the one-to-one mapping between ETW-events on Windows and LTTng-events on Linux.
@sywhang @josalem, could you please comment on the situation taking into account the context clarified by me above?
Thanks for filing the issue @valco1994! Let me see if I can help move this along a bit...
[@sywhang] To resolve callstacks in LTTng we use perf to get the stack (which includes both native and managed callstack)
This appears to conflate perfcollect with Lttng. Perfcollect is running both perf and lttng and each of them is producing a distinct set of events. @valco1994 is correct in noting that perf collects native callstacks for the events it generates but nothing is producing a callstack for the events which come from Lttng.
So, I propose to: reuse existing cross-platform code for stackwalking to generate CLRStackWalk event with LTTng
The principle that we'd have stackwalks for these events seems fine to me, but there are some details to sort out:
EventPipe does not actually have callstacks for all events - rather it has callstacks for all events generated by managed EventSource calls and only a subset of the events that come directly from the runtime. If that is acceptable we can follow the same path, if you are looking for all runtime events to have a callstack then we would need to solve this issue
The existing stackwalker in stackwalk.cpp is also likely to have higher performance overhead than what ETW was doing. If your needs are a <= a few thousand events/sec then it should be fine, above that and you will likely see the stackwalker consuming a non-trivial portion of CPU time.
Is stack symbolication needed? The CLRStackwalk event emits a list of IPs as I recall and most scenarios want a set of IP->name mapping information to symbolicate it with. There are a few different ways symbolication can be done such as using the JIT events (assuming the trace was enabled at that point in time), with rundown events (what ETW usually does), or with platform/tool specific techniques (what perf does).
We'd need to decide on the mechanism that turns stacks on/off.
unify and put in the one place code related to stackwalking
I am happy in principle with refactoring that lets Lttng and ETW share more of their stackwalking implementation but I might have reservations on specifics. I trust we could find something good and we probably don't need to dig into it until we've resolved the requirements related questions above.
@valco1994 given all your research were you interested in also implementing this feature or you are requesting Microsoft implements it? Either way is fine, though if it is request for us to handle it then we'll have to prioritize it against other requests. Right now we haven't heard from many customers in need of this so we'd probably prioritize other work while leaving the issue open so that others can register their interest.
Thanks for detailed answer @noahfalk!
As far as I understand, the limitations described by you are acceptable to me. Existing ways to do symbolication are also sufficient.
Probably, a new environment variable can be added to turn stacks on/off, as well as it was done for turning on/off LTTng events producing with COMPlus_EnableEventLog
.
I am interested in implementing this feature but unfortunately have no time now (and don't know if I will have time in the foreseeable future). So, currently, I'm requesting Microsoft to implement it.
I am interested in implementing this feature but unfortunately have no time now (and don't know if I will have time in the foreseeable future). So, currently, I'm requesting Microsoft to implement it.
Sure thing. As mentioned we haven't heard of anyone else needing this right now so it wouldn't currently be a priority, but we'll leave the issue open and see if it gains more interest. And of course if you or anyone else wants to work on it I am happy to discuss next steps for putting together a PR. Thanks!
It was declared by
.NET
team (details can be found in several places, e.g. here), thatLTTng
had been chosen as a major tool for performance analysis on Linux. Even more, it was written thatand
By the way there is a
DotNETRuntime:CLRStackWalk
event inCoreCLR
, which provides managed callstacks and, as far as I found, unexpectedly wasn't being emitted on Linux. Even more, code related to callstack manipulation and sending insrc/coreclr/src/vm/eventtrace.cpp
is conditionally compiled with the predicate!HOST_UNIX
.At the same time new
.NET Core
subsystem for performance analysis -EventPipe
- successfully provides callstacks on Linux. And code related to callstack manipulation insrc/coreclr/src/vm/eventpipe.cpp
is written in a cross-platform manner. And there are also filesstackcontents.h
,stackwalk.h
,stackwalk.cpp
in the same directory, which relate to stackwalking but are not used by both subsystems.So, I propose to:
LTTng