espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.37k stars 7.21k forks source link

Programmatic way to capture backtraces for RiscV (IDFGH-6189) #7866

Closed ivmarkov closed 1 year ago

ivmarkov commented 2 years ago

(This request is primarily driven by Rust, but it might be useful in the C world as well.)

The ability to capture backtraces programmatically - outside of panic situations - has benefits in the Rust world. For one, this allows Rust's error handling to produce errors which are enriched with a backtrace information which is captured at the time when the Rust error is created.

The status quo regarding programmatic backtrace capturing in the ESP-IDF is that it is asymmetric in that there is an API here, but it is only available for Xtensa and is not (yet?) ported to RiscV.

So the first request would be:

(1) Single, unified API that allows to capture backtrace information for both Xtensa and RiscV.

There are complications with RiscV of course, in that - in the absence of FP - capturing backtraces on that platform is only possible when the eh_frame functionality is enabled, which increases the size of the final executable. Yet we believe that programmatic backtrace capturing would be useful even with that restriction in mind. (When eh_frame information is not complied into the firmware however, calling the backtrace API should return an error or an empty backtrace, and not crash the program. See also below the treatment of raw stack memory for having another - always available - option for programmatic backtraces for RiscV.)

RiscV panicking behavior in the ESP-IDF is interesting, in that - in the absence of eh_frame information - it outputs raw stack memory instead (1KB or more), which is then decoded into meaningful function names by the monitor, using GDB.

Now, I would argue that - in the absence of eh_frame information - outputting raw stack memory is actually an acceptable behavior even for the programmatic backtracing API, even if it sounds weird at a first glance. Why? Because even the "real" eh_frame (or FP-based - in the case of Xtensa) backtracing info does require decoding by a specialized code in the monitor, and does require access to the ELF executable by the monitor as the "real" backtracing info is just a sequence of raw IPs without any symbolic information (for obvious reasons). Where I'm getting at, is that - from the POV of the programmer - the "real" raw IP-based backtracing info is just as unergonomic as the "stack memory dump" backtracing information in that it requires a specialized monitor (as opposed to Linux's screen tty) to re-symbolize the stack call chain into something human readable.

Which brings the next request:

(2) Provide an option for the backtracing API to fallback to returning raw stack memory. Perhaps for Xtensa too (why not?)

(When I say "fallback" I don't really mean output on the console. I mean the API should return a reference to the stacktrace memory to the calling code - somehow. It might even be sneaked in the current API contract somehow. For our "ideal API contract (libunwind) - see below.)

This option can either be controlled at runtime, or at compile time, possibly with configuration (CONFIG_*) settings. E.g.:

The final topic is that calling this new/extended ESP-IDF backtracing API has to be upstreamed in Rust's backtrace-rs crate, which is also used in Rust's STD library.

Our experience so far is that if your platform is not one of the 4 major ones (win/lin/mac/wasm), upstreaming is easier if the changes are minimal.

The current code in backtrace-rs which is used for unix-like platforms (where ESP-IDF actually belongs!) that captures backtraces relies on API calls to the libunwind functionality (_Unwind_Backtrace, _Unwind_GetIP, _Unwind_FindEnclosingFunction and _Unwind_GetCFA) which is part of the GCC toolchain (and I think also part of the LLVM toolchain).

Now, to my shock and entertainment, these functions are available in ESP-IDF, and - up to version 4.3.1 inclusive - these used to work just fine on Xtensa and used to produce backtraces, even when ESP-IDF was NOT compiled with C++ exceptions enabled (the default)! (But these functions still did crash for RiscV. Always. Even with eh_frame enabled and even with C++ exception support enabled. ?!)

So in a way, for Xtensa and ESP-IDF <= 4.3.1, we did not have to "upstream" anything. It just works. :)

The situation is not so rosy since ESP-IDF 4.4+. Due to some issue related to final binary code size that I can't find right now, these functions are in later releases stubbed out in the ESP-IDF with custom implementations when C++ exceptions are not enabled (the default situation) and calling those stubs leads to a panic. Not even to returning an empty backtrace (which would've been a bit more tolerable).

Which brings the 3rd topic:

(3) To make upstreaming in Rust easier, the programmatic backtrace generation should (also) work via the above 4 __Unwind_* API calls

Ideally, the 4 __Unwind_* API calls:

Well, that's it. Sorry, this ended up like a "mini RFC" of sorts.

I'm available for questions, comments and experiments.

ivmarkov commented 2 years ago

@igrr ^^^

ivmarkov commented 2 years ago

@MabezDev ^^^. Sorry for pinging. In case you spot an incorrect statement reg. the Rust side of things, pls let me know.

ivmarkov commented 1 year ago

@o-marshmallow Unlike xtensa where we always have frame pointers, for riscv we need to enable CONFIG_ESP_SYSTEM_USE_EH_FRAME, to get backtraces, right?

o-marshmallow commented 1 year ago

Hello @ivmarkov ,

Indeed, for RISC-V targets you will need to enable CONFIG_ESP_SYSTEM_USE_EH_FRAME as it will let the compiler include the DWARF symbols inside the final binary, which are required to unwind backtraces.