Closed yurydelendik closed 4 years ago
I'll help take a look at this when I'm back from vacation on Thursday.
It appears that RtlCaptureStackBackTrace
may only support a naive walk and it stops at the first frame that isn't from a mapped PE.
I think we'll need to use RtlCaptureContext
and then RtlVirtualUnwind
in combination with RtlLookupFunctionEntry
to properly walk the stack.
A simplistic implementation I can link to is from Edge's JavaScript runtime: https://github.com/microsoft/ChakraCore/blob/master/lib/Common/Core/FaultInjection.cpp#L113
It might be worth a comment at the call to RtlVirtualUnwind
that it may disassemble function epilogues, so in the future if we ever support breakpoints in Wasm code we'd need to ensure that we restore the original epilogue instructions prior to walking the stack.
In case it's not clear what the code is doing, RtlLookupFunctionEntry
returning NULL can be treated as if RIP is in a leaf function, hence RSP points at the return address and it can just be "popped", sparing a call to RtlVirtualUnwind
to unwind the frame. As we currently don't support leaf functions in Wasmtime for unwind info, this shouldn't happen yet, but it's something that the stack walker should support.
I got some positive results with https://github.com/yurydelendik/wasmtime/commit/21ff2b9f62f03751060e7489665f2d943c1d0a3f
Do we need to extend https://github.com/rust-lang/backtrace-rs with such functionality or we just limit it to wasmtime only?
I don't know if supporting JIT frames is useful in the general case as it usually requires runtime support for any sort of diagnostic utility.
But it can't hurt to propose it upstream maybe?
FWIW I manage the backtrace
crate and would be happy to update the implementation we have there. I don't really understand enough about Windows though to know why what we currently do doesn't work for JIT frames and why these alternative APIs would. @peterhuene do you know of docs and/or do you have an overview of the differences?
The current Windows implementation tries to use StackWalkEx
and falls back to StackWalk64
if that symbol isn't available to generate a backtrace for now. Turning addresses into names is done later with either SymFromInlineContextW
or SymFromAddrW
depending on which function was used to generate the stack trace.
According to StackWalk[Ex|64]
documentation regarding the FunctionTableAccessRoutine
parameter:
This parameter is required because the StackWalk[Ex|64] function does not have access to the process's run-time function table.
I suspect that's for architectural reasons to support out-of-proc stack walks. So I'm hazarding a guess that the default SymFunctionTableAccess64
function works by looking up the module base address and then reading the module's pdata section. That won't work for IPs that don't map to a loaded module known by the OS.
For backtrace
to support walking runtime-generated functions in a generic fashion, it should accept user callbacks for getting the module base address (the GetModuleBaseRoutine
parameter) and function table access (the FunctionTableAccessRoutine
parameter). It would be pretty easy for Wasmtime to support such callbacks given what we store for the code memories.
Hm so what you say all makes sense, but now I may be a bit confused as well. We implemented something in wasmtime to support some sort of backtraces so longjmp/faults work, right? It sounds like the StackWalk*
routines don't use that same infrastructure for generating backtraces? Although given that they can give access to inline frame information that sort of makes sense because it's different sets of information.
Is there a "standard" way to sort of tell backtrace requesters about runtime functions generated? Sort of how we have to tell the runtime how to backtrace everything already? Or is this how RtlLookupFunctionEntry
plus RtlVirtualUnwind
would work better than StackWalk*
? (sorry I'm pretty naive here, although I wrote most of the Windows backtrace stuff I was largely just copying it from places rather than getting a deep understanding of everything going on)
What we implemented in Cranelift was to generate the runtime function tables for all JIT'd code and in Wasmtime to register the runtime function tables with Windows. With the runtime function tables registered, functions like RtlUnwindEx
(called by VC++'s implementation of longjmp
as well as in SEH implementations) work without raising an invalid stack exception; this was the cause of the Wasm trap crashes on Windows prior to these changes.
Given the documentation above, I believe StackWalk[Ex|64]
does not consult the current process' runtime function table (i.e. RtlLookupFunctionEntry
) by design and therefore ignores what Wasmtime is registering with the OS entirely. Thus, it is incapable, by default, of walking stacks with IPs from generated functions and this is why it accepts those parameters to customize the walk. The walk being implemented by @yurydelendik only supports in-proc, and that is fine for our use case.
I originally believed that RtlCaptureStackBackTrace
would consult the registered function tables, but apparently based on this comment from the Windows research kernel, the OS uses the infamous Windows loader lock to guard the registered function tables and thus it's not safe for RtlCaptureStackBackTrace
to take ownership of that lock during a walk and just stops the walk whenever it can't map an IP to a loaded module.
So to sum up, there's apparently no "standard" way to walk a stack that contains runtime-generated functions on Windows. For backtrace
to do so, it should either do its own in-proc implementation to consult the registered function tables (ala what Yury is doing) or allow the crate's users to customize the callbacks being passed to StackWalk[Ex|64]
. The latter has the general-purpose benefit of also looking up symbolic information for the non JIT frames, which is not something our in-proc walk is doing since we don't necessarily need that for Wasmtime users.
That's a bummer :(.
Would it be possible though to have this entire implementation self-contained in the backtrace
crate? Ideally wasmtime
wouldn't have to do anything to tell backtrace
what to do, backtrace
would just read the standard list of tables (that we register for RtlUnwindEx
) and "do it's thing".
If we need to do a bunch of Windows or wasmtime-specific manipulation it's probably best to avoid changing the backtrace
crate for now (or forking it temporarily for wasmtime's purposes), but if we can perhaps put everything into backtrace
(likely feature gated at first) that'd be awesome.
Since backtrace
is only doing an in-process walk, I think rather than implementing a walk using RtlVirtualUnwind
as Yury has it, we could pass to StackWalk[Ex|64]
a wrapper around SymFunctionTableAccess64
that would first look up the entry using RtlLookupFunctionEntry
and then fallback to SymFunctionTableAccess64
. I believe that's all it should take to make it work.
Actually, falling back is probably unnecessary as RtlLookupFunctionEntry
should do the same work as SymFunctionTableAccess64
for non-runtime-generated function IPs (but probably faster than dbghelp
do it can since RtlLookupFunctionEntry
can simply traverse the current process' PEB whereas I think dbghelp
maintains an internal list of the process' loaded modules to support out-of-proc walks).
Oh ok that's close to what I was hoping we could do, do you have an example of how to do that though? SymFunctionTableAccess64
looks like it has the signature where we'd just assert the process handle was our own and we'd then look something up based on the pc provided. How would we translate that request to a call to RtlLookupFunctionEntry
? Or otherwise where does PUNWIND_HISTORY_TABLE
come from and how to we go from PRUNTIME_FUNCTION
to IMAGEAPI
?
The return type of the PFUNCTION_TABLE_ACCESS_ROUTINE64
callback used by StackWalk[Ex|64]
is PVOID
(i.e. void*
) and expected to be a pointer to RUNTIME_FUNCTION
for x86-64, same as what RtlLookupFunctionEntry
returns. For the purposes of backtrace
's walk, we can ignore the process argument entirely and just pass the PC to RtlLookupFunctionEntry
.
The unwind history information is an optimization to save repeated lookups while walking/unwinding a stack. Ideally we'd pass the same structure through the entire walk, but there doesn't seem to be a way to offer context to the callback function from StackWalk[Ex|64]
, but I think it would work (albeit slightly less optimally) if we always just passed in a zero-initialized history structure.
oh oops, I misread the documentation of SymFunctionTableAccess64
and thought the return value was IMAGEAPI
, when in fact that's just the calling convention and indeed the return value is PVOID
.
I'll work on poking around with this tomorrow with the backtrace
crate and see if I can't get something working, although it's definitely still some blind stabbing in the dark for me heh.
If you have other things on your plate, I'd be happy to implement what I proposed in backtrace
and do the requisite testing on Windows.
Oh sure that works too! My strategy was gonna be to get the test suite working again and then point Yury at the fork to test out with wasmtime and go from there. If you need any help navigating the backtrace crate just let me know!
I think this issue is now resolved with #823.
@yurydelendik if I'm wrong and there's still work to do here for stack walking on Windows, please reopen. Thanks!
Trying to find a way to utilize the already existing mechanism of CL to provide UNWIND_INFO for Windows (see crates/jit/src/function_table.rs). The platform
StackWalkEx
/StackWalk64
orRtlCaptureStackBackTrace
fail to utilize the registered withRtlAddFunctionTable
information.I created a test case at https://github.com/yurydelendik/wasmtime/tree/win-stacktrace : run
cargo run --example hello
from the "crates/api" directory. The following output is observed:The expected output is something provided by VS. Notice that VS has three entries in form of/starting with
000001xxxxxx002f
, but wasmtime produces only first one.These 3 entries are: call-to-rust trampoline, wasmtime jit function, and rust-to-jit trampoline.
cc @peterhuene