dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Can't collect the mixed mode call stack due to ambiguity of Profiler API #103443

Open yegor-pelykh opened 3 months ago

yegor-pelykh commented 3 months ago

I'm trying to get the mixed mode call stack inside my C++ DLL using the Profiler API.

I looked at various examples of obtaining a call stack in open-source applications, but I didn't come across a single approach, and it seemed to me that none of them creates a complete single stack of managed and unmanaged calls, as I need.

After reading this article, I realized that using DoStaskSnapshot I can get all the managed frames as well as the unmanaged stack blocks among them. These blocks of unmanaged calls need to be expanded into multiple frames from one. But then I had big problems.

1) Firstly, the process of expanding a block into frames is unclear. 2) I pass my thread context to the DoStackSnapshot function and set the COR_PRF_SNAPSHOT_REGISTER_CONTEXT flag. The article states that if I pass my context to the function, the stack collection is better, the topmost unmanaged block (newest) should come on top of the managed ones. But it doesn't come. 3) Ok, since it doesn’t come, I decided to do some kind of walk through the unmanaged stack myself. I called StackWalk64 in a loop and collected the entire stack of unmanaged IPs. Then, as I understand it, I need to go through all the unmanaged IPs from above in a row until I encounter the first managed IP (if I understand correctly). But here there is a problem here. How to find this first managed IP? I tried to call GetFunctionFromIP for each unmanaged IP, but not all stacks had at least one IP for which GetFunctionFromIP would return a managed function. Sometimes such an IP is found, sometimes not, it is very changeable. Although, if we look at the call stack of the Visual Studio, there are managed frames in it. And I’m also absolutely sure that there is. I think this looks like a bug. 4) For some reason, in every call of callback DoStackSnapshotCallback (let's say this is the name of the callback for DoStackSnapshot) I receive the same context. Is this correct at all? I thought that in each call there should be different registers, at least for EIP/RIP, EBP/RBP, ESP/RSP. If this is how it should be, what important information then do I need to store for each frame from this callback for further investigation? If for each frame I have the same information... Yes, there is also an ip parameter, but the articles seem to recommend not relying on it, in addition they say that it may even be 0, which also doesn't give me a thorough understanding of what needs to be done with all this.

It is also important to mention some features of my case: I need to get the call stack asynchronously, from hooked functions.

I saw application code example that should build a mixed stack. But I haven't been able to get it to work properly, and looking at the code, it seems to me to be outdated, in which various pieces of code look like rough drafts put together. And I doubt the efficiency and correctness of the approaches in this code. I also looked at all the uses of DoStackSnapshot on GitHub, but I still couldn’t find a complete and unified approach on how to build such a stack.

As a result, all that I have succeeded in so far is to obtain a separate set of unmanaged IPs from StackWalk64 (only a small part, the top one, of which resolves into symbolic information, and for the rest it is impossible to obtain even a module), and also to separately collect a set of managed frames and part of the unmanaged blocks (the topmost block doesn't come) from the DoStackSnapshot.

Could you please help with this? There is almost no information about this on the internet. Also, perhaps we will find some bugs in this.

dotnet-policy-service[bot] commented 3 months ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

tommcdon commented 3 months ago

@noahfalk @davmason

davmason commented 3 months ago

Hi @yegor-pelykh,

I will try my best to help but stack walking is a very complex subject and I do not necessarily have authoritative answers. I suspect you were unable to find a working example because not many people have authoritative answers either.

The first thing that comes to mind is that I suspect you will have better luck with RtlUnwind rather than using StackWalk64 for unwinding native frames. This is what we have used in the runtime historically to unwind frames.

Secondly, when you say that you receive the same context for each callback do you mean the values of all registers are the same or that the same pointer is passed? I would expect registers to have different values for different frames. If you can post examples it might help

yegor-pelykh commented 3 months ago

Hi @davmason Thank you for your response!

I will try to separate an example project from my full project that would reproduce these problems and ambiguities. But this will take some time...

Regarding the fact that this is a very complex subject, what do you mean by "complex"? There are 2 cases: 1) There is a clear understanding of how information from the profiler can be used to obtain a mixed stack, how the stack is structured in dotnet. In this case, the difficulty is only in possibly using a large set of code, several well-defined steps, to process all the information and put it together. That is, it is simply a question of the amount of code and time. 2) There is no clear understanding of how the managed stack works, how it interacts with the unmanaged one. In this case, the subject is rly more complicated, because there are not even specific planned algorithms for how this can be done at all. But in this case, the question arises: if function DoStackSnapshot was made, then there should have been a clear idea of ​​how exactly to use it and how to supplement it in order to get a call stack. How could it be created without a clear understanding of its task and role in obtaining the stack and how it could be supplemented with other functions? Why couldn't "profiler's team" collaborate with other teams that are focused on the dotnet "memory, functions" area and how it is represented in the call stack to get the full picture? After all, it’s easier for you to do this than for ordinary developers unfamiliar with the depths of dotnet to guess.

It’s just that the easiest way is to find once and for all one way, one example, so that later you don’t have to answer the same questions from many others...

yegor-pelykh commented 3 months ago

To show my attempts and approach in practice, and to be on the same page, I created a repository with a separate solution: https://github.com/yegor-pelykh/MixedCallStackSample Feel free to pull and build it (Visual Studio 2022).

Please take a look at it and also pay special attention to places marked TODO (you can just search for TODO). This is how problems are flagged.

yegor-pelykh commented 3 months ago

Please, @davmason, could you take a look at the code?

davmason commented 3 months ago

Hi @yegor-pelykh - I took a brief look at your code, it will be a lot easier for me to help you if you ask specific questions I can answer for you. As much as I would like to, I won't be able to make time to build and debug a full project and get back to you quickly.

Nothing immediately obvious jumps out as a problem from a brief look. I did notice that you are trying to walk the native frames after DoStackSnapshot returns - it would be safer to process the frames as they are reported to you. If you end up walking a stack of a different thread then the stack could be mutating after DoStackSnapshot returns. For your scenario you are walking the same thread as you are executing on so it won't be a problem, but if you try a different thread it would be.

What results do you get from StackWalk64 on the native blocks?

yegor-pelykh commented 3 months ago

For your scenario you are walking the same thread as you are executing

You are right. I need to get the stack only from the currently running thread.

When you have a chance to debug the code a little, to speed it up and not have to study the entire project completely, I'd advise you to first look inside such functions:

Just see what happens inside these functions, how the stack is created, and you won't have to learn all the code.

What results do you get from StackWalk64 on the native blocks?

I will try to remake the code as you advise, so that the native frames are collected inside the DoStackSnapshotCallback if a native block is encountered. But! The first, and one of the main problems I have: why every time method DoStackSnapshotCallback is called (in one call of DoStackSnapshot) I get the same context? This is wrong, am I right? The context pointer is different, but the important registers in them are the same (and sometimes clearly incorrect, such as the value Rip = 128). You can see this problem for yourself. In this case, a traversing the stack using a StackWalk64 with this context will not be valid.

yegor-pelykh commented 3 months ago

So, to make it clear what problems there are: 1) Each time the DoStackSnapshotCallback is called, a context comes with the same registers. And often they are OBVIOUSLY wrong. 2) When first passing through native frames BEFORE calling DoStackSnapshot, how to correctly find all the IPs and stop at the first managed frame in order to then call DoStackSnapshot? How to accurately determine the first managed frame?

If I've provided too much work and information, could you please include anyone else in this discussion?

davmason commented 3 months ago

Hi @yegor-pelykh,

I was able to take another look today. I think the problems you are seeing with CONTEXT are because you have a typo in the cast here: https://github.com/yegor-pelykh/MixedCallStackSample/blob/40dba17576175ac3a8ac6eba45a582115dc86366/MixedCallStackSample.Client/CorProfiler.cpp#L875C22-L875C57

It should be reinterpret_cast<CONTEXT*> instead. Right now you are treating it as a reference and reading some bogus data due to the cast.

For the question of how to find the first managed frame, it should be sufficient to walk the native stack and call GetFunctionFromIP to see if the IP is managed or not.

yegor-pelykh commented 3 months ago

It should be reinterpret_cast<CONTEXT*> instead. Right now you are treating it as a reference and reading some bogus data due to the cast.

Thank you very much for the tip! It really was my fault. Now this has been corrected and the question about the "same context" is discarded.

For the question of how to find the first managed frame, it should be sufficient to walk the native stack and call GetFunctionFromIP to see if the IP is managed or not.

In my example it definitely doesn't work like that. To show this, I've now updated my repository.

There is a new method GetNativeFrames that is called before DoStackSnapshot. It simply iterates through the StackWalk64 calls, checking each frame to see if it is managed (using GetFunctionFromIP). If it is not, the frame is added to the returned list of native frames, and if it is, then we break the iteration. As a result, a list of native frames located above the first managed frame should be returned.

But for some reason, in almost all cases the nativeFrames collection is wrong. The managed frame is often not encountered at all, allowing all iterations of StackWalk64 to go through. This is definitely wrong. And this is especially noticeable when the Main function of the profiled app has already been called, and we have a call stack with obvious managed frames (which can be seen in the CallStack window in Visual Studio).

Could you please check the updated example?

davmason commented 3 months ago

I don't see anything obviously wrong with the call to StackWalk64, could you post some examples of threads with managed frames and what StackWalk64 returns for them?

It should be easy to see what is going wrong by setting a breakpoint in you GetMixedCallstack function and comparing the stack trace in the debugger to what StackWalk64 returns

yegor-pelykh commented 3 months ago

I don't think I'll make things much clearer if I just post the stack traces here, but I'll try:

Call stack from Visual Studio ```txt MixedCallStackSample.Client.64.dll!MixedCallStackSampleClient::CorProfiler::GetMixedCallStack C++ MixedCallStackSample.Client.64.dll!MixedCallStackSampleClient::Interceptor::ProcessLoadLibrary C++ MixedCallStackSample.Client.64.dll!MixedCallStackSampleClient::Interceptor::OnLoadLibraryA C++ [Managed to Native Transition] MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.LoadTestLibrary C# MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.TestInnerProc3 C# MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.TestInnerProc2 C# MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.TestInnerProc1 C# MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.TestProcedureCallback C# [Native to Managed Transition] MixedCallStackSample.Client.64.dll!ManagedToNativeTestFunc C++ [Managed to Native Transition] MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.TestProcedure C# MixedCallStackSample.ClientTest.CLR.dll!MixedCallStackSample.ClientTest.CLR.Program.Main C# [Native to Managed Transition] [Inline Frame] hostpolicy.dll!coreclr_t::execute_assembly C++ hostpolicy.dll!run_app_for_context C++ hostpolicy.dll!run_app C++ hostpolicy.dll!corehost_main C++ hostfxr.dll!execute_app C++ hostfxr.dll!`anonymous namespace'::read_config_and_execute C++ hostfxr.dll!fx_muxer_t::handle_exec_host_command C++ hostfxr.dll!fx_muxer_t::execute C++ hostfxr.dll!hostfxr_main_startupinfo C++ MixedCallStackSample.ClientTest.CLR.exe!exe_start C++ MixedCallStackSample.ClientTest.CLR.exe!wmain C++ [Inline Frame] MixedCallStackSample.ClientTest.CLR.exe!invoke_main C++ MixedCallStackSample.ClientTest.CLR.exe!__scrt_common_main_seh C++ kernel32.dll!BaseThreadInitThunk Unknown ntdll.dll!RtlUserThreadStart() Unknown ```
Call stack from StackWalk64 before `DoStackSnapshot` call ```txt [0] 0x00007ffbccde6d8d {MixedCallStackSample.Client.64.dll!MixedCallStackSampleClient::Interceptor::ProcessLoadLibrary(const ATL::CStringT>> & libFileName), Line 48} void * [1] 0x00007ffbccde675b {MixedCallStackSample.Client.64.dll!MixedCallStackSampleClient::Interceptor::OnLoadLibraryA(const char * libFileName), Line 72} void * [2] 0x00007ffa8b1d77f8 void * [3] 0x00000067eb3fe4f0 void * [4] 0x00007ffa8b1d772d void * [5] 0x00007ffa8b280238 void * [6] 0x00007ffa8b272c28 void * [7] 0x642e323372657375 void * [8] 0x00007ffa00006c6c void * [9] 0x00000067eb3fe4e0 void * [10] 0x000028d9c4aa03b9 void * [11] 0x00007ffaeafca648 {coreclr.dll!void *vtable_InlinedCallFrame} void * [12] 0x00000067eb3fe8d0 void * [13] 0x00007ffa8b2800c0 void * [14] 0x00007ffa8b2800c0 void * [15] 0x00000067eb3fe4e0 void * [16] 0x00007ffa8b1d77f8 void * [17] 0x00000067eb3fe500 void * ```

Here you can see that the stack from Visual Studio is correct. But the stack of collected unmanaged frames from StackWalk64 is clearly incorrect. Looks just like garbage in memory. Most of the functions, except the first two from my DLL, are not recognized as normal native functions, but at the same time, none of these addresses are recognized as managed by GetFunctionFromIP call. The fact is that even when the stack clearly contains managed frames, NO ONE address returned in the StackWalk64 call loop is recognized as managed, after which the loop ends because StackWalk64 returns FALSE. This is clearly wrong, isn't it?

I think it would be better for you to debug for yourself. This would give you more information. It's easy to debug. All launch parameters are configured correctly. You just need to set the breakpoint to GetMixedCallstack function and start debugging with project MixedCallStackSample.ClientTest.CLR as the starting one.

yegor-pelykh commented 3 months ago

I came up with a way to demonstrate incorrect operation.

1) Pull the last repository 2) Set breakpoint to the CorProfiler.cpp line 630 (if (SUCCEEDED(result)) ...) 3) Start debugging and reach the set breakpoint for the first time 4) Add to Visual Studio Watch window these two items:

Something is definitely wrong.

I really hope that I managed to explain this clearly now.

davmason commented 3 months ago

I think a good next step here is to use the attached debugger and figure out what StackWalk64 is returning so we can answer those questions better.

Using SOS and Windbg together you would be able to verify what each frame is, the ln command lists any symbols near the specified address, and the !ip2md SOS command will tell you if it is a managed IP or not.

Once we have the details on what StackWalk64 is doing then I can give you better advice

tommcdon commented 3 months ago

Hello @yegor-pelykh! My guess is that StackWalk64 is not correctly walking the stack as it is starting from a point on the stack that which isn't easily traversed. Since the intent of DoStackSnapshot is to not only return a managed stack frames, but will also notify the profiler where unmanaged frames start in between groups of managed stack frames. The unmanaged portions starting at the thread context StackSnapshotCallback provides when funcId is set to 0 (as suggested in this article). It will provide the thread context which can be used as a "seed" to walk the unmanaged frame chain. Note there may be multiple unmanaged frames "sandwiched" in between groups of managed frames, so the unmanaged stack chain can be terminated as soon as a managed frame is reached (verified using ICorProfilerInfo::GetFunctionFromIP). It seems @davmason already identified the source of the problem with DoStackSnapshot, and so now we will leave it to you to walk the unmanaged frames. If further assistance is needed on this, please feel free to contact us or consider paid support options through Microsoft Support - https://support.microsoft.com.

yegor-pelykh commented 3 months ago

@tommcdon Thanks for the clarification, but still, I think closing this thread is a bit hasty because I didn't just ask a question about how it should work, I said it doesn't work in an obvious and correct way, I'm pretty sure there's a bug as well. I was discussing more the problem that bypassing unmanaged frames did not fit in with what DoStackSnapshot returns, and not that I don’t know basic info about how to do that from articles. And I expected that you, on your side, would look and investigate this problem. If you say that the problem is in StackWalk64, then the same problem exists when using CaptureStackBackTrace. What then is not a problem, what would work normally in combination with DoStackSnapshot and GetFunctionFromIP, could you please tell me? Moreover, after @davmason last suggestion to look at this in WinDbg, I did it and was surprised that not only this problem remained, but also the results returned by the DoStackSnapshot itself didn't match the stack from WinDbg. But now I don’t have the ability to discuss this with you, although there is clearly something wrong there. I'm sorry that you don't want to support the functionality that people use. Honestly, I was hoping for other support level from Microsoft, whose products are used by the whole world.

tommcdon commented 3 months ago

Hello @yegor-pelykh!

@tommcdon Thanks for the clarification, but still, I think closing this thread is a bit hasty because I didn't just ask a question about how it should work, I said it doesn't work in an obvious and correct way, I'm pretty sure there's a bug as well. But now I don’t have the ability to discuss this with you, although there is clearly something wrong there.

Thanks for the quick response - I'll re-activate the issue. I suggest providing technical details on what DoStackSnapshot issues you are uncovering.

yegor-pelykh commented 3 months ago

Thanks @tommcdon for understanding.

I changed the approach to obtain a native stack, and instead of StackWalk64 I use walking by EBP (32 bit process) and RtlVirtualUnwind (64 bit process). Those problems are solved. Now there is always at least one managed frame on the stack (when it should be). But!

The problem now is not with the DoStackSnapshot itself, but with something else, definitely related to the CLR. I’ll say right away that in order to fully understand this, you need to see it yourself by running the above-mentioned project for debugging. I’ll also clarify that the problem is relevant ONLY if we run the project in x64 / AnyCPU mode (that is, a 64-bit dll is loaded and we look at the 64-bit stack). Problem workflow: 1) Set the breakpoint on CorProfiler.cpp, line 612 (if (SUCCEEDED(result)) ...) 2) Add nativeFrames variable to Watch window 3) Press F5 at the above breakpoint to stop at it again and again until the application's Main procedure is loaded and variable nativeFrames contains a list of 3 frames, the last of which looks in the debugger as a pointer with no symbolic information.

This third frame is a problem itself. As far as I understand, this frame refers to ILStub, but I’m not sure.

My questions regarding this issue are: 1) Why is it impossible to obtain not only symbolic information at this address, but even the module to which the pointer belongs? And there is NO module in the list of modules loaded into the process within which this pointer is located. How can I get the module of this address and the symbolic information on it? By the way, WinDbg and the Visual Studio also don't see the symbols for this frame. 2) If this address is passed to GetFunctionFromIP, then it says that this is NOT a managed function. But if I try to check this third frame in WinDbg with command !ip2md, then the command says that this is a managed function and refers to the ILStub. Is this a bug with GetFunctionFromIP?

Could you please help with these questions?

I understand that my questions are not very simple, but they cannot just be told and shown, you have to see it for yourself.

scrennshot