Unable to backtrace JIT frames on Windows

yurydelendik commented 4 years ago

Trying to find a way to utilize the already existing mechanism of CL to provide UNWIND_INFO for Windows (see crates/jit/src/function_table.rs). The platform StackWalkEx/StackWalk64 or RtlCaptureStackBackTrace fail to utilize the registered with RtlAddFunctionTable information.

I created a test case at https://github.com/yurydelendik/wasmtime/tree/win-stacktrace : run cargo run --example hello from the "crates/api" directory. The following output is observed:

RtlCaptureStackBackTrace: [0x7ff762a35fde, 0x7ff762a36247, 0x7ff76309b52f, 0x7ff76309b64a, 0x7ffdb8ca6b26, 0x7ffdb8c44849, 0x7ffdb8ce34ee,
0x164298c002f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0] 8
trace/StackWalk: 0x7ff762a5a0f9 0x7ff762a5a6df 0x7ff762a36124 0x7ff762a36247 0x7ff76309b52f 0x7ff76309b64a 0x7ffdb8ca6b26 0x7ffdb8c44849 0x7ffdb8ce34ee
0x164298c002f 0x287d36ed40 0x7ff762a593f6 0x7ff762a36344 0x7ff76309b35d 0x7ff762a3685d 0x7ff7626a7824

The expected output is something provided by VS. Notice that VS has three entries in form of/starting with 000001xxxxxx002f , but wasmtime produces only first one.

dump_stack

These 3 entries are: call-to-rust trampoline, wasmtime jit function, and rust-to-jit trampoline.

cc @peterhuene

peterhuene commented 4 years ago

I'll help take a look at this when I'm back from vacation on Thursday.

peterhuene commented 4 years ago

It appears that RtlCaptureStackBackTrace may only support a naive walk and it stops at the first frame that isn't from a mapped PE.

I think we'll need to use RtlCaptureContext and then RtlVirtualUnwind in combination with RtlLookupFunctionEntry to properly walk the stack.

A simplistic implementation I can link to is from Edge's JavaScript runtime: https://github.com/microsoft/ChakraCore/blob/master/lib/Common/Core/FaultInjection.cpp#L113

It might be worth a comment at the call to RtlVirtualUnwind that it may disassemble function epilogues, so in the future if we ever support breakpoints in Wasm code we'd need to ensure that we restore the original epilogue instructions prior to walking the stack.

In case it's not clear what the code is doing, RtlLookupFunctionEntry returning NULL can be treated as if RIP is in a leaf function, hence RSP points at the return address and it can just be "popped", sparing a call to RtlVirtualUnwind to unwind the frame. As we currently don't support leaf functions in Wasmtime for unwind info, this shouldn't happen yet, but it's something that the stack walker should support.

yurydelendik commented 4 years ago

I got some positive results with https://github.com/yurydelendik/wasmtime/commit/21ff2b9f62f03751060e7489665f2d943c1d0a3f

Do we need to extend https://github.com/rust-lang/backtrace-rs with such functionality or we just limit it to wasmtime only?

peterhuene commented 4 years ago

I don't know if supporting JIT frames is useful in the general case as it usually requires runtime support for any sort of diagnostic utility.

But it can't hurt to propose it upstream maybe?

alexcrichton commented 4 years ago

FWIW I manage the backtrace crate and would be happy to update the implementation we have there. I don't really understand enough about Windows though to know why what we currently do doesn't work for JIT frames and why these alternative APIs would. @peterhuene do you know of docs and/or do you have an overview of the differences?

The current Windows implementation tries to use StackWalkEx and falls back to StackWalk64 if that symbol isn't available to generate a backtrace for now. Turning addresses into names is done later with either SymFromInlineContextW or SymFromAddrW depending on which function was used to generate the stack trace.

peterhuene commented 4 years ago

According to StackWalk[Ex|64] documentation regarding the FunctionTableAccessRoutine parameter:

This parameter is required because the StackWalk[Ex|64] function does not have access to the process's run-time function table.

I suspect that's for architectural reasons to support out-of-proc stack walks. So I'm hazarding a guess that the default SymFunctionTableAccess64 function works by looking up the module base address and then reading the module's pdata section. That won't work for IPs that don't map to a loaded module known by the OS.

For backtrace to support walking runtime-generated functions in a generic fashion, it should accept user callbacks for getting the module base address (the GetModuleBaseRoutine parameter) and function table access (the FunctionTableAccessRoutine parameter). It would be pretty easy for Wasmtime to support such callbacks given what we store for the code memories.

alexcrichton commented 4 years ago

Hm so what you say all makes sense, but now I may be a bit confused as well. We implemented something in wasmtime to support some sort of backtraces so longjmp/faults work, right? It sounds like the StackWalk* routines don't use that same infrastructure for generating backtraces? Although given that they can give access to inline frame information that sort of makes sense because it's different sets of information.

Is there a "standard" way to sort of tell backtrace requesters about runtime functions generated? Sort of how we have to tell the runtime how to backtrace everything already? Or is this how RtlLookupFunctionEntry plus RtlVirtualUnwind would work better than StackWalk*? (sorry I'm pretty naive here, although I wrote most of the Windows backtrace stuff I was largely just copying it from places rather than getting a deep understanding of everything going on)

peterhuene commented 4 years ago

What we implemented in Cranelift was to generate the runtime function tables for all JIT'd code and in Wasmtime to register the runtime function tables with Windows. With the runtime function tables registered, functions like RtlUnwindEx (called by VC++'s implementation of longjmp as well as in SEH implementations) work without raising an invalid stack exception; this was the cause of the Wasm trap crashes on Windows prior to these changes.

Given the documentation above, I believe StackWalk[Ex|64] does not consult the current process' runtime function table (i.e. RtlLookupFunctionEntry) by design and therefore ignores what Wasmtime is registering with the OS entirely. Thus, it is incapable, by default, of walking stacks with IPs from generated functions and this is why it accepts those parameters to customize the walk. The walk being implemented by @yurydelendik only supports in-proc, and that is fine for our use case.

I originally believed that RtlCaptureStackBackTrace would consult the registered function tables, but apparently based on this comment from the Windows research kernel, the OS uses the infamous Windows loader lock to guard the registered function tables and thus it's not safe for RtlCaptureStackBackTrace to take ownership of that lock during a walk and just stops the walk whenever it can't map an IP to a loaded module.

So to sum up, there's apparently no "standard" way to walk a stack that contains runtime-generated functions on Windows. For backtrace to do so, it should either do its own in-proc implementation to consult the registered function tables (ala what Yury is doing) or allow the crate's users to customize the callbacks being passed to StackWalk[Ex|64]. The latter has the general-purpose benefit of also looking up symbolic information for the non JIT frames, which is not something our in-proc walk is doing since we don't necessarily need that for Wasmtime users.

alexcrichton commented 4 years ago

That's a bummer :(.

Would it be possible though to have this entire implementation self-contained in the backtrace crate? Ideally wasmtime wouldn't have to do anything to tell backtrace what to do, backtrace would just read the standard list of tables (that we register for RtlUnwindEx) and "do it's thing".

If we need to do a bunch of Windows or wasmtime-specific manipulation it's probably best to avoid changing the backtrace crate for now (or forking it temporarily for wasmtime's purposes), but if we can perhaps put everything into backtrace (likely feature gated at first) that'd be awesome.

peterhuene commented 4 years ago

Since backtrace is only doing an in-process walk, I think rather than implementing a walk using RtlVirtualUnwind as Yury has it, we could pass to StackWalk[Ex|64] a wrapper around SymFunctionTableAccess64 that would first look up the entry using RtlLookupFunctionEntry and then fallback to SymFunctionTableAccess64. I believe that's all it should take to make it work.

peterhuene commented 4 years ago

Actually, falling back is probably unnecessary as RtlLookupFunctionEntry should do the same work as SymFunctionTableAccess64 for non-runtime-generated function IPs (but probably faster than dbghelp do it can since RtlLookupFunctionEntry can simply traverse the current process' PEB whereas I think dbghelp maintains an internal list of the process' loaded modules to support out-of-proc walks).

alexcrichton commented 4 years ago

Oh ok that's close to what I was hoping we could do, do you have an example of how to do that though? SymFunctionTableAccess64 looks like it has the signature where we'd just assert the process handle was our own and we'd then look something up based on the pc provided. How would we translate that request to a call to RtlLookupFunctionEntry? Or otherwise where does PUNWIND_HISTORY_TABLE come from and how to we go from PRUNTIME_FUNCTION to IMAGEAPI?

peterhuene commented 4 years ago

The return type of the PFUNCTION_TABLE_ACCESS_ROUTINE64 callback used by StackWalk[Ex|64] is PVOID (i.e. void*) and expected to be a pointer to RUNTIME_FUNCTION for x86-64, same as what RtlLookupFunctionEntry returns. For the purposes of backtrace's walk, we can ignore the process argument entirely and just pass the PC to RtlLookupFunctionEntry.

The unwind history information is an optimization to save repeated lookups while walking/unwinding a stack. Ideally we'd pass the same structure through the entire walk, but there doesn't seem to be a way to offer context to the callback function from StackWalk[Ex|64], but I think it would work (albeit slightly less optimally) if we always just passed in a zero-initialized history structure.

alexcrichton commented 4 years ago

oh oops, I misread the documentation of SymFunctionTableAccess64 and thought the return value was IMAGEAPI, when in fact that's just the calling convention and indeed the return value is PVOID.

I'll work on poking around with this tomorrow with the backtrace crate and see if I can't get something working, although it's definitely still some blind stabbing in the dark for me heh.

peterhuene commented 4 years ago

If you have other things on your plate, I'd be happy to implement what I proposed in backtrace and do the requisite testing on Windows.

alexcrichton commented 4 years ago

Oh sure that works too! My strategy was gonna be to get the test suite working again and then point Yury at the fork to test out with wasmtime and go from there. If you need any help navigating the backtrace crate just let me know!

peterhuene commented 4 years ago

I think this issue is now resolved with #823.

@yurydelendik if I'm wrong and there's still work to do here for stack walking on Windows, please reopen. Thanks!

bytecodealliance / wasmtime

Unable to backtrace JIT frames on Windows #751