Support for fast mixed mode call stack determination

tapika commented 5 years ago

Originally wrote this as e-mail, but 'Jan Kotas' jkotas@microsoft.com proposed to raise this as github issue. Not attaching original e-mail recipients, and don't know if they want to be in this mail list.

Hi !

Approximately 3 years ago I have asked you about mixed mode call stack determination, and briefly documented your answer and my findings about this On stack overflow: https://stackoverflow.com/questions/34733155/resolve-managed-and-native-stack-trace-which-api-to-use

Today I have started to look through what is the best API to use for determining mixed mode call stack and find out That my solution was copied to DRace application https://github.com/siemens/drace/tree/9bdccd142ae8ab42f6f68eeeca86539abb16c1fe

Also I saw various bugfixes made to it (E.g. https://github.com/siemens/drace/commit/9abf73363269fb5ae7ffc3dd5023af26af86c8af )

But what I have noticed as well - https://github.com/siemens/drace/blob/9bdccd142ae8ab42f6f68eeeca86539abb16c1fe/ManagedResolver/src/ManagedResolver.cpp#L153

is mention that this call stack determination does not work for Core CLR.

What would be best API to use for mixed mode call stack determination from your perspective, which would be also portable to next generation code (Core CLR) ?

...

I see that I even could participate in this component creation / design, API defining, as long as there is commitment from Microsoft side to support this API for forthcoming new platforms.

fmoessbauer commented 5 years ago

I adapted the solution proposed by @tapika in the siemens/drace application. There, we just need to symbolize JITted addresses to a function name + file and line information.

However, @tapika's solution does not fully work on CoreCLR applications, as we were not able to get beyond function names. However in DRace we have some special situation, which might break this and is not applicable to the general. In short: DRace changes value of the EIP register of the application under test (as seen by the OS or an external debugger).

Until now we did not find a better solution to the following problems:

symbolize JITted addresses
reconstruct mixed callstacks including inlining (we trace all call instructions, but without support from the CLR side we cannot see inlined functions)

Due to the lack of better solutions we are thinking about using the SOS.dll WinDbg extension as this provides a uniform interface along all CLR implementations. Some early tests in WinDbg (+SOS.dll) showed that the line information is available, even in our special situation. However, this is not a clean solution as the extension just provides a textual interface (ask a query -> parse the answer) according to my knowledge.

We would be very happy to know if there are other approaches which are more portable. Additionally some hints to documentation on how to use the IXCLRData interface would be great.

noahfalk commented 5 years ago

Here is some lay of the land and if you've got more questions afterwards fire away : ) We've got three algorithms at least to go through:

Stack unwinding - Given a thread generate a list of IPs representing the code in each frame
Method name resolution - Given an IP, convert it to a module and name (and probably metadata token for managed code)
Source line resolution - for managed code, convert from module+metadata token+IL offset -> source file and line

The main contenders that show up a bunch are CLRMD/ICorDebug which are debugging APIs and ICorProfiler which is a profiling API. The primary difference is that profiling APIs are available inside a process whereas the debugging APIs are designed for out-of-process use against live processes, or any form of process memory snapshot (like a dump file). It should be possible for a process to take a snapshot of itself and then debug that snapshot if you want to self-debugging, but its not something I have tried myself.

Stack unwinding

In order to behave well with native stackwalking algorithms (such as the one employed by the windows kernel for ETW or in a native debugger), the raw IPs in the jitted portion of a stack are usually still recoverable from native stack unwinders that you can find in DIA, RtlVirtualUnwind and StackWalk64

Other Options: CLRMD offers a stackwalking API, but it is geared towards managed code only. You could probably identify gaps in the stack between managed frames and use a native stack unwinder to fill them in if you wanted to. ICorDebug offers an API that has similar capabilities, but it is often less well documented and harder to get set up than CLRMD. ICorProfiler has the DoStackSnapshot API which has had some reliability issues over the years and we deprecated it on the Linux port because it was making too many platform specific assumptions for us to convince ourselves we could deliver a satisfactory implementation. **IXCLRData*** interfaces in mscordacwks.dll (mscordaccore.dll on .Net Core) you've noticed also has stackwalking APIs, but these are mostly private implementation details and I don't recommend you code against them if you want a stable or supported interface. The implementation in that dll is ultimately what SOS, ICorDebug, and CLRMD are all calling into so you've got some officially supported ways to access the same functionality.

Resolving IPs to method names / metadata tokens

CLRMD offers GetMethodByAddress which is probably the most straightforward option. Alternatively if you used CLRMD's stackwalker then the ClrStackFrame object has a Method property. ICorDebug offers ICorDebugFrame::GetFunction(), but that requires you to use ICorDebug's unwinder. There isn't a way to do it from a raw IP. You would need to get the metadata token and module from the function, then get metadata from the module, then lookup the token in the metadata to get type and method names. ICorProfiler offers GetFunctionFromIP followed by various methods to inspect the FunctionID. Similar to ICorDebug you need to extract module + MethodDef token, then use metadata to convert tokens to names.

Resolving method tokens + IL offset to source lines

The data to store this mapping is stored in PDB files so we need to use a different set of libraries to parse them in order to extract the info. SOS has some code that shows how this works using System.Reflection.Metadata.dll's FromPortablePdbStream API to parse the portable PDB format. In addition to newer portable PDB format there is also possibility of apps using the classic format which was supported with some other APIs and here is some example code that uses it.

So we've glossed over a lot but I hope these pointers give a good starting point. If it feels like the options are too much I suggest starting the exploration with CLRMD and System.Reflection.Metadata. HTH, -Noah

tapika commented 5 years ago

There is huge amount of API's and background information behind this. Studying all API's, testing and verifying that everything works when combined together is rather painful.

I have originally made memory leak detection tool, resides on sourceforge still, but just to illustrate idea.

I'm intrested in API like this:

https://sourceforge.net/p/diagnostic/svn/HEAD/tree/src/ResolveStack.h

CaptureStackBackTracePro captures IP's (1), and using GetMethodName resolves (2), GetFileLineInfo resolves source code location, and GetModuleName resolves dll (new item, 4 ?).

Background machinery can vary whether it's native or managed call stacks, and API interfaces behind it (E.g. https://sourceforge.net/p/diagnostic/svn/HEAD/tree/src/ResolveStackM.h ), but API should resemble what is defined in that header file.

Basically you initialize API's, then you query call stack - in similar manner to CaptureStackBackTracePro, maybe hash is optional, and after that you have multiple void* IP's - where execution went.

And you can resolve symbols separately and independently from IP resolving.

It would be good also to have some safety mechanism - e.g. if FreeLibrary is called, or C# assembly is freed (.NET Core 3), then IP owner would be notified of dll or assembly shutdown, so end-user have a chance to resolve required symbols.

(2) Method name resolution is probably applicable only to .NET at the moment, not so interested until C++ gets C++ module support, then it makes sense to standardize this as well.

Originally when I have implemented call stack resolving I've noticed by myself that it did work in 98% cases, and in remaining 2% it failed, but this is probably due to my incomplete stack resolving mechanism (I'm using fast approach, if you take into account different C++ optimization mechanisms, then stack resolving requires disassembly, and then it becomes really slow).

Felix - expect similar kind of problem when switching stack resolving using other methods. (E.g. sos.dll or similar)

I think 98% resolve rate is acceptable for first prototype and maybe even to accept it as a final solution.

Currently I don't need fast stack resolving, but only when exception is happening, .net StackTrace would collect exception call stack from native side as well, but I don't see any reason why system cannot be extended to support other cases as well (E.g. memory leak detection, etc...)

Does it makes any sense me, Felix and people from Microsoft hit our heads together in same git repository and make good API for call stack resolving ?

tapika commented 4 years ago

Any progress on this ticket ?

noahfalk commented 4 years ago

Does it makes any sense me, Felix and people from Microsoft hit our heads together in same git repository and make good API for call stack resolving ?

Sorry its probably not what you are hoping to hear, but it isn't currently a priority for the folks working on .Net at Microsoft to contribute or maintain a multi-language stackwalking API. I am happy to help answer some questions about .Net similar to the discussion above but that would be the limit of my involvement at this point.

tapika commented 4 years ago

Clearly mixed call stack determination is non-trivial thing. There already exists two or more copies for my own call stack determination - probably does not work in all cases, like mentioned above / .net core.

Would it be possible to create like official git for call stack determination - I could be the first one to commit initial version, but I would prefer that maintenance / bugfixing of that git would be returned back to you / Microsoft.

We could audio chat if you prefer to discuss this bit deeper ?!

noahfalk commented 4 years ago

I would prefer that maintenance / bugfixing of that git would be returned back to you / Microsoft

I'm sorry @tapika, taking on this responsibility doesn't align with our current priorities.

tapika commented 4 years ago

Ok, can we start from problematic point ?

Basically I would like to be able to resolve call stack which can contain both - C++ and C# within the same call stack.

With C++ situation is pretty much ok, as dbghelp.dll has more or less stable documented API, which can be used.

My concern is C# call stack resolving, which was apparently broken between .net framework and .net core (https://github.com/siemens/drace/blob/9bdccd142ae8ab42f6f68eeeca86539abb16c1fe/ManagedResolver/src/ManagedResolver.cpp#L153)

Originally I've took this code from google search for "sos StartEnumMethodInstancesByAddress" points to https://github.com/dotnet/diagnostics/blob/master/src/SOS/Strike/util.cpp

which is a part of Son of strike - brief review here: http://etutorials.org/Programming/programming+microsoft+visual+c+sharp+2005/Part+IV+Debugging/Chapter+13+Advanced+Debugging/Son+of+Strike+SOS/

Is it possible to use "son of strike" to determine C# call stack from C++ ?

noahfalk commented 4 years ago

Is it possible to use "son of strike" to determine C# call stack from C++ ?

Yes, you can run the "CLRStack" command (https://docs.microsoft.com/en-us/dotnet/framework/tools/sos-dll-sos-debugging-extension) and it will print the managed call stack. However if you are looking for a programmatic interface rather than a textual user interface then I'd refer back to the APIs we discussed earlier

which was apparently broken between .net framework and .net core

The API that code is calling isn't supported for 3rd party use on .Net Core so it isn't too surprising that it didn't work smoothly. (Until fairly recently it wasn't supported on .Net Framework either)

tapika commented 4 years ago

However if you are looking for a programmatic interface rather than a textual user interface then I'd refer back to the APIs we discussed earlier

With C# it takes ~ 1 min to code sample application with full stack trace, but you refer to quite many different API's, which in a turn will probably require some integration effort.

Maybe you have some example git repository, where all API's usage could be demoed ?

For example clang includes sample repository: https://github.com/johnthagen/clang-blueprint where you can try out some of clang features, wondering if you have something similar, or maybe something similar could made as well ?

I think I want programmatic access to stack walking api's, .net core is important, but .net framework compatibility is also important to me at the moment.

noahfalk commented 4 years ago

I found a sample @leculver made for walking the stack with CLRMD here: https://github.com/microsoft/dotnet-samples/tree/master/Microsoft.Diagnostics.Runtime/CLRMD/ClrStack

tapika commented 4 years ago

Ok, this looks like good example - I have checked that you indeed support .NET Framework, also suspect we will get .NET Core out of box.

But in final solution this call stack resolving must reside on C++ side, so we have alternatives to use either:

C++/clr, supported only by Microsoft compiler.
Using PInvoke / UnmanagedExports.Repack - then can call C# directly from C++.

Ok, this settles managed side call stack resolving.

Let's come back to native call stack resolving. We have alternatives here - either self-made, or try to pick to something more portable, for example:

Windows only, my own old example: https://sourceforge.net/p/diagnostic/svn/HEAD/tree/src/ResolveStack.cpp

Portable, part of Oculus rift: https://github.com/focusright/ovr_sdk_win/blob/master/ovr_sdk_win_1.43.0/LibOVRKernel/Src/Kernel/OVR_DebugHelp.cpp

These are probably not only examples, which are available on internet.

One question came to my mind - can native call stack determination support cmake's unity builds.

https://onqtam.com/programming/2019-12-20-pch-unity-cmake-3-16/ https://www.qt.io/blog/2019/08/01/precompiled-headers-and-unity-jumbo-builds-in-upcoming-cmake https://cmake.org/cmake/help/v3.16/prop_tgt/UNITY_BUILD.html

What I've played around - unity builds can indeed speed up build performance. But main problem with unity itself is that there are bunch of autogenerated files like unity_0.cpp, unity_1.cpp, which in a turn include other .cpp files.

If I would try to guess what will happen next - is C++ call stack resolving would resolve to unity file instead of original source code.

Maybe by you can also recommend good native call stack resolving sample application ?

tapika commented 4 years ago

I found a sample @leculver made for walking the stack with CLRMD here: https://github.com/microsoft/dotnet-samples/tree/master/Microsoft.Diagnostics.Runtime/CLRMD/ClrStack

Slightly adjusted given example to work with active process, but it cannot resolve source code path / file line number ? "This is not yet implemented." reads in code sample.

leculver commented 4 years ago

That's an old sample, and the comment has nothing to do with source file and line numbers. ClrMD now supports getting the type and module of the function on the stack, which is what that was referring to. Keep in mind that ClrMD is a runtime inspection API, not a general purpose debugger. It does not generally deal with source or line information as part of its operation.

In order to resolve source and line numbers, you have to take the metadata tokens of the function/type in question (which ClrMD provides) then use a PDB reading library to read the source file and line numbers out of the PDB.

Unfortunately there are now two pdb formats, the regular PDB format and the Portable PDB format. You'll have to find a library that's capable of reading both (or using two libraries depending on which type of PDB you are reading).

I do not have a sample of doing this.

tapika commented 4 years ago

Found one more library, I guess simplest based from API's usage perspective:

https://www.boost.org/doc/libs/1_65_1/doc/html/stacktrace.html

Need to try it out how it does not works with .net framework (sorry for being pessimistic) :-)

leculver commented 4 years ago

That library is for regular PDBs, not Portable PDBs...and C++ code is resolved from module offset into ranges that correspond to symbols in PDBs. CLR code runs off of metadata tokens. It's a different table in the PDB entirely. The code there isn't relevant for symbolizing .Net Code.

The msdia*.dll libraries can be used for regular pdbs to resolve method tokens into file and line numbers. I'm not sure if it was ever updated to support portable pdbs.

tapika commented 4 years ago

That library is for regular PDBs, not Portable PDBs.

What are those ? Windows standard .pdb, and .net core portable .pdb ?

tapika commented 4 years ago

The msdia*.dll libraries can be used for regular pdbs to resolve method tokens into file and line numbers. I'm not sure if it was ever updated to support portable pdbs.

Missing any kind of example. At the moment I'm harvesting github, by trying to identify where stack resolving api's are being used, and trying to identify best suitable library for my need. It's bit difficult to understand your answers, as I don't have same in-depth knowledge on call stack walking apis.

leculver commented 4 years ago

As mentioned in https://github.com/dotnet/runtime/issues/12405#issuecomment-607984846, there are two PDB formats now. One is the "regular" PDB format (also known as Windows PDBs) which has been around for ages.

.Net Core has defined a new PDB format called "Portable PDBs" which are completely different format, despite also being called ".pdb". Some libraries can only parse one format or the other, or they may parse both. I'm not sure whether there's a good library or API out there that can parse both at the same time. I'm a bit out of date on that information.

leculver commented 4 years ago

It's bit difficult to understand your answers, as I don't have same in-depth knowledge on call stack walking apis.

If I could step back a second summarize this thread, what you are looking to do is create a universal stack walking API that can understand C++ code and .Net code (both desktop CLR and .Net Core). Essentially a universal stackwalker right? And you want to implement it in C++ I assume from your post?

Unfortunately, building such a thing requires in-depth knowledge of "call stack walking apis" and/or deep knowledge of how debuggers do stack unwinding and resolving symbols that it can find. What you are asking to do isn't simply gluing a few APIs together, it'll require really solving some messy problems when unifying those two worlds. This includes: Communicating with symbol servers, stitching together callstacks, dealing with multiple PDB formats, possibly writing a C++ parser for portable pdbs, and so on. All of that isn't a simple task even for someone like me who has worked in this space for years.

To set expectations: I've tried to simplify this as much as possible, but I'm not sure I can put it in any easier terms. What you are asking for doesn't exist in a simple, easy to consume library. You are solidly in the territory of "I have to become an expert in all of these areas in order to build what I'm asking for and go build a complicated library myself." I can give you pointers to some libraries, APIs, and concepts that you'd have to use put such a thing together, but further than that you are on your own in terms of building that project... As Noah mentioned before this isn't something that the CLR team has the resources (or the interest) to go spend 4-6 weeks building. For someone who doesn't have the context on how all of these things work, it would likely take longer.

Sorry we couldn't be of more help here! Good luck!

tapika commented 4 years ago

I suspect that we need to create some sort of open source code library for call stack determination, and it does not needs to be as complex as boost / stacktrace.hpp, but front end API could resemble boost.

If we start working from bare bones, we could add more flesh to it later on.

Also maybe I'm putting too many requirements to it, maybe start from few requirements and adjust on the way.

But based on my analysis stack walking library must be separate / independent and isolated from other libraries, because it's bit heavy in overall.

What I've briefly analyzed API's -

CLRMD - does not offset file/line resolution
ICorDebug - no github project have any call stack reference to that API, suspect cannot be easily used.
ICorProfiler - not worth of using, as it's depricated.
IXCLRData - only working API at the moment, process hacker and my own diagnostic uses it, .net core support needs to be analyzed in depth.

I've noticed also that boost stacktrace.hpp supports also exception handling - I will definitely will try it out. Could be direct replacement for our own solution (https://github.com/dotnet/diagnostics/issues/152).

tapika commented 4 years ago

Btw, there exists an attempt to standardize call stack determination: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0881r5.html

Apparently initiative is based upon boost / stacktrace.

Interesting to see if they have initial sketch of that library somewhere on github.

What I have tested boost / exception handling mechanism: https://www.boost.org/doc/libs/1_65_1/doc/html/stacktrace/getting_started.html#stacktrace.getting_started.handle_terminates_aborts_and_seg

That one indeed works for simple C++ console applications, but unfortunately does not work with C# based applications.

Suspect code from here: https://github.com/dotnet/diagnostics/issues/152 might help eventually catching exceptions / crashes.

std::stacktrace is definitely something we need in future c++ standards, but .net framework / .net core interoperability needs to be checked separately, I guess.

tapika commented 4 years ago

I've started new stacktrace fork, located in here:

https://github.com/tapika/stacktrace

At the moment added managed call stack determination, but not yet exception handling or managed symbol resolving, plan to do it later on.

tapika commented 4 years ago

What I've briefly analyzed boost / stacktrace - I think one of use cases which I eventually would like to achieve - is to be able to resolve symbol information after the crash.

Use case is probably such that application crashes, stack trace is collected, but debug symbols (.pdb's) are not available at the time of crash. Stack trace is sent in either raw format (vector of ip's), or from UI you have option to resolve ip's to symbols, but this would require downloading debug symbols.

Of course if you resolve call stack on live process, then you can download also symbols before resolving call stack.

But this could lead also to further problems, like symbol server not available, pc is offline, and so on.

I think call stack could be collected automatically, independently whether we have chance to resolve symbol information or not.

But if we design api to be able to resolve call stack afterwards, I think we need to change logic of symbol resolving to use native c++ function calls instead of com interfaces. Even thus managed call stack resolving is indeed requiring com, I think we should first solve native call stack to work in better approach, and then go after managed call stack.

I've found initial description of what I want in here: https://github.com/JochenKalmbach/StackWalker#walking-the-callstack-of-other-threads-in-other-processes

Basically we will need to enumerate loaded modules and their addresses and collect it next to stack trace class.

I'm now not sure how this would map to linux and other platforms, which boost supports, but I hope someone will help me with that one.

tapika commented 4 years ago

Bit more on issues I face at the moment, does not relate to this issue indeed, but eventually if Microsoft guys want to help / pick up whole library under maintenance (as stacktrace will become a C++ standard eventually), it would be good to check through how overall system will work and how to test it.

FYI:

Boost build vs cmake build system.

Initially my idea was to build boost stacktrace with cmake instead of boost build, because cmake was capable of generating Visual studio projects for boost, while boost build was not supporting that option.

Besides this one I wanted to generate c++/clr and c# projects, where such feature is not available in boost build, but it's a matter of which compiler and which command line arguments build system uses.

So supporting c++/clr dialect, which is Microsoft compiler specific might be even possible via boost build system, but C# is not supported and will not be supported any time soon by boost build. Also visual studio project generation.

Boost stack trace library however shows good example of how testing should be handled - besides test projects itself boost stacktrace generates a lot of different project permutations with debug symbols on / debug symbols off, and so on - this is useful and mandatory for testing stack trace library.

Original boost stacktrace (without my modification) reached code coverage 90%+, I would eventually want to reach that target, but might take some time, as my main focus is windows at the moment (not linux or others).

This means also generating same kind projects permutations as boost build / stack trace is generating - but extended. To my best understanding current stack trace focuses on 32-bit builds, I need 64-bit builds.

Also what I have noticed - if we switch symbol resolving logic - e.g. from com based to dbghelp.dll/SymInitialize/... based (https://github.com/JochenKalmbach/StackWalker) - this can have impact of symbols resolving logic:

https://github.com/JochenKalmbach/StackWalker#known-issues Doesn't work when debugging with the /DEBUG:fastlink option

I would prefer to use that option, as it speeds up linking dramatically.

But this also reminds me that even current symbols resolving logic might not work with that linking option.

So project generation needs to be expanded to cover that linker option as well.

Also like previously mentioned - mixed call stack determination did work on .net framework, but was not working on .net core - same problem - project generation need to expand to handle c# / .net core & .net framework variations.

And this returns me back to boost build / cmake build tool selection.

Need to decide whether we use first, second or both.

I've quickly checked boost build, and it's not capable of generating visual studio projects, and that feature cannot be easily added.

Boost build source code is indeed bit cleaner than cmake, but both are driving build system towards custom script engine system, not from C++ project modelling perspective. ninja, chromium gn also drives same kind thinking - custom script engine, dialect varies everywhere (cmake dialect, boost build jam dialect, ninja dialect, chromium gn dialect).

To achive fastest project generation performance, need to use fastest language / script with performance optimizations suitable on that programming language. I would guess selected language will be C++, but need to construct solution / project modelling so it would allow best pre-caching and less build effort. (don't build projects which no need to build, in similar manner to cmake cache)

OK, back to where we were - boost build / cmake / both / custom build engine.

For boost stack trace we could switch build tool to be cmake completely, but suspect it would require changes in boost build system to make an exception on boost build and build cmake based project instead of boost build based.

Need to chat about this maybe with Antony Polukhin / boost build system maintainers. Btw, tried to send a mail to boost build and boost cmake mail lists - but without any success. Maybe all boost mail lists are pretty much dead by now ?!

At the moment if no other alternative - then maintain two build systems (boost build and cmake) for stack trace, if someone answers or makes different proposal - could discuss alternatives.

Meanwhile I'll try to generate more project permutations for testing purposes using cmake. My intention is to cover variations for no debug symbols / debug symbols, 32/64 bit builds, maybe also .net core/.net framework, maybe also with / without /FASTLINK option.

tapika commented 4 years ago

Bit more updates:

Opened ticket for cmake / .net core:

https://gitlab.kitware.com/cmake/cmake/-/issues/20741

At the moment I can generate C# projects for .net framework, but not for .net core unfortunately.

Tested 32-bit platform support - does not works out of box. I've left code in commented state, but my goal is to support 64-bit / windows, so don't care so much about 32-bit support. If will manage to make it work - then will make it work, if not - ok for me too.

See https://github.com/tapika/stacktrace/blob/develop/src/exception_handler.cpp#L206

// What I have tested - native exception works, but not managed. Need to filter out somehow whether exception is native or managed.
//MH_CreateHookApi(dll2Hook, "__CxxFrameHandler3", &__CxxFrameHandler3_Detour, (LPVOID*)&detourOriginalFunc) == MH_OK &&
                    false &&

dotnet / runtime