dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Callstacks for LTTng events #34137

Open valco1994 opened 4 years ago

valco1994 commented 4 years ago

It was declared by .NET team (details can be found in several places, e.g. here), that LTTng had been chosen as a major tool for performance analysis on Linux. Even more, it was written that

For every ETW event in CoreCLR we construct an LTTng-UST tracepoint when running on Linux, which means there's a complete one-to-one mapping between the two - whenever an ETW event is emitted, there's a corresponding LTTng-UST tracepoint.

and

... the information that comes from the runtime is the same regardless of whether the code is executed on Windows or Linux.

By the way there is a DotNETRuntime:CLRStackWalk event in CoreCLR, which provides managed callstacks and, as far as I found, unexpectedly wasn't being emitted on Linux. Even more, code related to callstack manipulation and sending in src/coreclr/src/vm/eventtrace.cpp is conditionally compiled with the predicate !HOST_UNIX.

At the same time new .NET Core subsystem for performance analysis - EventPipe - successfully provides callstacks on Linux. And code related to callstack manipulation in src/coreclr/src/vm/eventpipe.cpp is written in a cross-platform manner. And there are also files stackcontents.h, stackwalk.h, stackwalk.cpp in the same directory, which relate to stackwalking but are not used by both subsystems.

So, I propose to:

hoyosjs commented 4 years ago

@sywhang @josalem cc: @dotnet/dotnet-diag

sywhang commented 4 years ago

It was declared by .NET team (details can be found in several places, e.g. here), that LTTng had been chosen as a major tool for performance analysis on Linux.

Yes, and this is still true :)

By the way there is a DotNETRuntime:CLRStackWalk event in CoreCLR, which provides managed callstacks and, as far as I found, unexpectedly wasn't being emitted on Linux.

The way managed callstacks get resolved in LTTng is different from the way they get resolved on EventPipe or Windows. To resolve callstacks in LTTng we use perf to get the stack (which includes both native and managed callstack). If I recall correctly I believe perf uses libunwind to get the callstack for each tracepoints. On top of this, since the OS doesn't know how to resolve jitted (managed) callstacks, the runtime emits a file that maps IPs to symbols for jitted code. This gets zipped into .trace.zip file that you see when you use perfcollect and PerfView is able to decode them into managed callstack.

At the same time new .NET Core subsystem for performance analysis - EventPipe - successfully provides callstacks on Linux.

Yes, but only for managed. In fact, for callstack resolutions LTTng is ahead of EventPipe in the sense that it can provide both native and managed callstack. EventPipe can only understand managed callstack so when users want native callstack we point them to LTTng.

reuse existing cross-platform code for stackwalking to generate CLRStackWalk event with LTTng

As explained above, this event isn't necessary to get managed code.

unify and put in the one place code related to stackwalking

The runtime has many components that use stackwalking. In diagnostics space alone the profiler APIs and SampleProfiler which is used by EventPipe to get managed callstack both rely on stackwalking code. Both of them use the code you found (stalkwalk.cpp).

valco1994 commented 4 years ago

I had read about perf and perfcollect before writing this issue and, as far as I understand, they do not satisfy my requirements. It is important for me to have a callstack corresponding to the event precisely. But perf can only sample callstacks with a specified frequency or provide them for its events. And it doesn't know about LTTng events emitted by CoreCLR at all.

Is it right? If there is a way to establish a correspondence between LTTng events and native callstacks collected by perf, it would be wonderful.

Now about DotNETRuntime:CLRStackWalk. It’s the fact that in ETW it’s produced for every event which logically has associated stack. I mean this event: https://docs.microsoft.com/en-us/dotnet/framework/performance/stack-etw-event. And in such a case, I can establish a correspondence between other events and their callstacks. Furthermore, the absence of this event on Linux breaks promises about the one-to-one mapping between ETW-events on Windows and LTTng-events on Linux.

valco1994 commented 4 years ago

@sywhang @josalem, could you please comment on the situation taking into account the context clarified by me above?

noahfalk commented 4 years ago

Thanks for filing the issue @valco1994! Let me see if I can help move this along a bit...

[@sywhang] To resolve callstacks in LTTng we use perf to get the stack (which includes both native and managed callstack)

This appears to conflate perfcollect with Lttng. Perfcollect is running both perf and lttng and each of them is producing a distinct set of events. @valco1994 is correct in noting that perf collects native callstacks for the events it generates but nothing is producing a callstack for the events which come from Lttng.

So, I propose to: reuse existing cross-platform code for stackwalking to generate CLRStackWalk event with LTTng

The principle that we'd have stackwalks for these events seems fine to me, but there are some details to sort out:

  1. EventPipe does not actually have callstacks for all events - rather it has callstacks for all events generated by managed EventSource calls and only a subset of the events that come directly from the runtime. If that is acceptable we can follow the same path, if you are looking for all runtime events to have a callstack then we would need to solve this issue

  2. The existing stackwalker in stackwalk.cpp is also likely to have higher performance overhead than what ETW was doing. If your needs are a <= a few thousand events/sec then it should be fine, above that and you will likely see the stackwalker consuming a non-trivial portion of CPU time.

  3. Is stack symbolication needed? The CLRStackwalk event emits a list of IPs as I recall and most scenarios want a set of IP->name mapping information to symbolicate it with. There are a few different ways symbolication can be done such as using the JIT events (assuming the trace was enabled at that point in time), with rundown events (what ETW usually does), or with platform/tool specific techniques (what perf does).

  4. We'd need to decide on the mechanism that turns stacks on/off.

unify and put in the one place code related to stackwalking

I am happy in principle with refactoring that lets Lttng and ETW share more of their stackwalking implementation but I might have reservations on specifics. I trust we could find something good and we probably don't need to dig into it until we've resolved the requirements related questions above.

@valco1994 given all your research were you interested in also implementing this feature or you are requesting Microsoft implements it? Either way is fine, though if it is request for us to handle it then we'll have to prioritize it against other requests. Right now we haven't heard from many customers in need of this so we'd probably prioritize other work while leaving the issue open so that others can register their interest.

valco1994 commented 4 years ago

Thanks for detailed answer @noahfalk!

As far as I understand, the limitations described by you are acceptable to me. Existing ways to do symbolication are also sufficient.

Probably, a new environment variable can be added to turn stacks on/off, as well as it was done for turning on/off LTTng events producing with COMPlus_EnableEventLog.

I am interested in implementing this feature but unfortunately have no time now (and don't know if I will have time in the foreseeable future). So, currently, I'm requesting Microsoft to implement it.

noahfalk commented 4 years ago

I am interested in implementing this feature but unfortunately have no time now (and don't know if I will have time in the foreseeable future). So, currently, I'm requesting Microsoft to implement it.

Sure thing. As mentioned we haven't heard of anyone else needing this right now so it wouldn't currently be a priority, but we'll leave the issue open and see if it gains more interest. And of course if you or anyone else wants to work on it I am happy to discuss next steps for putting together a PR. Thanks!