Open sastel opened 6 months ago
Here is what I've been using to debug this:
These scripts go in nuttxspace/nuttx
.
$ cat gdb.sc
target remote :3333
break cxa_exception.cpp:279
break exit.c:55
$ cat polarfire-nuttx.resc
:name: PolarFire NuttX
:description: This is a sample script running NuttX on Icicle Kit with PolarFire SoC
emulation SetGlobalSerialExecution true
emulation SetGlobalQuantum "0.0008"
using sysbus
mach create
machine LoadPlatformDescription @platforms/cpus/polarfire-soc.repl
showAnalyzer mmuart1
macro reset
"""
sysbus LoadELF @./nuttx
"""
runMacro $reset
e51 IsHalted true
#u54_1 IsHalted true
u54_2 IsHalted true
u54_3 IsHalted true
u54_4 IsHalted true
machine StartGdbServer 3333
@jrosberg did you face similar issue with using C++ on PolarFire?
@acassis @jrosberg the eh_frame
and eh_frame_hdr
sections are missing from the linker script boards/risc-v/mpfs/icicle/scripts/ld.script
. That's part of the problem, though there seems to be other things missing because I was not able to fix it by adding those sections.
@sastel maybe you can look some other board with C++ support to trace the differences, probably there some other details. Also if you could try in some ARM (like STM32F4Discovery board) it could help to see if everything still working as expected.
Nice @jrosberg is not around, maybe @pussuw could help us! Ville, do you have some idea?
I don't use exceptions so no idea what's going on
Thanks @pussuw
@acassis I tried STM32F4 Discovery NSH build from 12.4 in Renode today, but it crashed on startup. I don't have the physical board.
I cloned gcc and compared the libgcc source with the asm from GDB as I stepped through the exception handling. It is clear that __cxa_throw
calls _Unwind_RaiseException
from gcc/libgcc/unwind.inc
. That calls uw_init_context_1
from unwind-dw2.c
which is calling abort()
because it can't find an FDE (the call to uw_frame_state_for
returns _URC_END_OF_STACK
after _Unwind_Find_FDE
returns NULL. I suspect that the FDE's are not being initialized correctly. I have to do some reading to understand how this is supposed to work.
Here's a great resource: https://maskray.me/blog/2020-12-12-c++-exception-handling-abi
Hi @sastel that is very interesting investigation. If you can, please write down about it in some blog, probably it will help more people in the future.
I made some progress on this today. _Unwind_Find_FDE
was returning NULL because the unseen_objects
linked list wasn't initialized. This linked list needs to be initialized at startup, and then during exception handling it has nodes added and moved into seen_objects
. The list is used for finding an FDE by program counter.
Looks like users of libgcc can either link with crtbegin.o
or write their own calls to libgcc __register_frame_info*
function. The crtstuff.c
file provides a frame_dummy()
function which is supposed to be placed in the .init_array
section by the linker, so that it gets invoked by the initialization logic pre-main. This frame_dummy
function does the registration using __EH_FRAME_BEGIN__
. The function was getting called by NuttX lib_cxx_initialize()
but, __EH_FRAME_BEGIN__
was NULL
so initialization didn't happen.
I hacked up the Toolchain.defs
file to link crtbegin.o
and crtend.o
so that I get frame_dummy
. With a new .eh_frame section in the linker script, __EH_FRAME_BEGIN__
points to the beginning of .eh_frame
and initialization works.
The next problem is that during libgcc FDE searching, I get a RISC-V load address misaligned crash. I'm not sure what to do about this yet.
@pussuw do you have some suggestion how it could be integrated back into mainline? Maybe we can use CONFIG_CXX_EXCEPTION to decide if we want to include crtbegin.o and crtend.o right?
@acassis to be clear, exceptions still aren't being caught - the load address misalignment crash is happening during the _Unwind_Find_FDE
call from the throw
.
I think we should use CONFIG_CXX_EXCEPTION
, but whether we want to link with crtbegin.o
or crtend.o
versus extern void __register_frame(void* begin)
and call it in our own init function is another question. Including .eh_frame
also uses a bunch of extra memory, so we won't want to include it unless it is used for CXX exceptions.
@acassis one of my colleagues built the gcc toolchain (including libgcc.a) for RISC-V with -mstrict-align. That fixed the crash. Now there's another problem! 🤣
libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error
I'll chase this one down next.
Hi @sastel nice to know you guys found a solution! And too sad you are facing a new issue. This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.
This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.
It's a good idea. I'll have to solve all of the issues before I'm remotely qualified to write the blog post though. 😄
libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error
I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table
) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.
The exception handling stack usage is quite high (I needed to set the stack size for the task which throws the exception to 9 kB), but at least exceptions work. I'm investigating what is using all of that stack. It appears to be mostly consumed by the libgcc exception stack unwinding functions.
A big thank you to my colleague @sholford for his support on this!
Congratulations @sastel, it was a quick journey and it succeed! Kudos!!!
Now you are qualified to write a blog exampling to others how to fix this issue!
I like to document all the steps when I'm fixing an issue, because I know it will help others.
For instance, a post I created almost two years ago: https://acassis.wordpress.com/2022/10/31/trying-to-fix-a-mqtt-communication-issue-with-nuttx-and-tagoio/ helped a friend of mine today. Yes was trying to use Wireshark to work as MITM with his embedded board.
I hope you remember all the steps you took! ;-)
Hi @sastel.
I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.
Could you kindly share your pull request on fixing this issue? I am facing a similar exception issue on ESP32s3. I hope your solution on icicle board could be a good reference of fixing on ESP board.
Thanks a lot.
@royfengsss this patch summarizes it.
diff --git a/arch/risc-v/src/common/Toolchain.defs b/arch/risc-v/src/common/Toolchain.defs
index 3d0e5296a9..e37598e2c4 100644
--- a/arch/risc-v/src/common/Toolchain.defs
+++ b/arch/risc-v/src/common/Toolchain.defs
@@ -285,6 +285,10 @@ endif
# Add the builtin library
EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-libgcc-file-name))
+ifeq ($(CONFIG_CXX_EXCEPTION),y)
+ EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtbegin.o))
+ EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtend.o))
+endif
ifeq ($(CONFIG_LIBM_TOOLCHAIN),y)
EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=libm.a))
diff --git a/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
new file mode 100644
index 0000000000..f536046464
--- /dev/null
+++ b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
@@ -0,0 +1,90 @@
+#
+# This file is autogenerated: PLEASE DO NOT EDIT IT.
+#
+# You can use "make menuconfig" to make any modifications to the installed .config file.
+# You can then do "make savedefconfig" to generate a new defconfig file that includes your
+# modifications.
+#
+# CONFIG_DISABLE_OS_API is not set
+# CONFIG_NSH_DISABLE_LOSMART is not set
+# CONFIG_NSH_DISABLE_MB is not set
+# CONFIG_NSH_DISABLE_MH is not set
+# CONFIG_NSH_DISABLE_MW is not set
+CONFIG_ARCH="risc-v"
+CONFIG_ARCH_BOARD="icicle"
+CONFIG_ARCH_BOARD_COMMON=y
+CONFIG_ARCH_BOARD_ICICLE_MPFS=y
+CONFIG_ARCH_CHIP="mpfs"
+CONFIG_ARCH_CHIP_MPFS250T_FCVG484=y
+CONFIG_ARCH_CHIP_MPFS=y
+CONFIG_ARCH_INTERRUPTSTACK=2048
+CONFIG_ARCH_RISCV=y
+CONFIG_ARCH_STACKDUMP=y
+CONFIG_BOARD_LOOPSPERMSEC=54000
+CONFIG_BUILTIN=y
+CONFIG_CXX_EXCEPTION=y
+CONFIG_CXX_RTTI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
+CONFIG_DEBUG_FULLOPT=y
+CONFIG_DEBUG_SYMBOLS=y
+CONFIG_DEFAULT_TASK_STACKSIZE=16384
+CONFIG_DEV_ZERO=y
+CONFIG_EXPERIMENTAL=y
+CONFIG_FS_PROCFS=y
+CONFIG_FS_ROMFS=y
+CONFIG_FS_SHMFS=y
+CONFIG_HAVE_CXX=y
+CONFIG_IDLETHREAD_STACKSIZE=2048
+CONFIG_INIT_ENTRYPOINT="nsh_main"
+CONFIG_INIT_STACKSIZE=3072
+CONFIG_INTELHEX_BINARY=y
+CONFIG_LIBCXX=y
+CONFIG_LIBCXXABI=y
+CONFIG_LIBC_HOSTNAME="icicle"
+CONFIG_LIBC_PERROR_STDOUT=y
+CONFIG_LIBC_STRERROR=y
+CONFIG_LIBM=y
+CONFIG_MEMSET_64BIT=y
+CONFIG_MEMSET_OPTSPEED=y
+CONFIG_MPFS_ENABLE_DPFPU=y
+CONFIG_MPFS_UART1=y
+CONFIG_NSH_ARCHINIT=y
+CONFIG_NSH_BUILTIN_APPS=y
+CONFIG_NSH_DISABLE_IFUPDOWN=y
+CONFIG_NSH_DISABLE_MKDIR=y
+CONFIG_NSH_DISABLE_RM=y
+CONFIG_NSH_DISABLE_RMDIR=y
+CONFIG_NSH_DISABLE_UMOUNT=y
+CONFIG_NSH_LINELEN=160
+CONFIG_NSH_STRERROR=y
+CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
+CONFIG_PREALLOC_TIMERS=4
+CONFIG_PTHREAD_STACK_DEFAULT=2048
+CONFIG_RAM_SIZE=1048576
+CONFIG_RAM_START=0x80200000
+CONFIG_RAW_BINARY=y
+CONFIG_READLINE_CMD_HISTORY=y
+CONFIG_READLINE_TABCOMPLETION=y
+CONFIG_RR_INTERVAL=200
+CONFIG_SCHED_HPWORK=y
+CONFIG_SCHED_HPWORKSTACKSIZE=2048
+CONFIG_SCHED_LPWORK=y
+CONFIG_SCHED_LPWORKSTACKSIZE=2048
+CONFIG_SCHED_WAITPID=y
+CONFIG_SERIAL_NPOLLWAITERS=2
+CONFIG_STACK_COLORATION=y
+CONFIG_START_MONTH=4
+CONFIG_START_YEAR=2021
+CONFIG_SYSTEM_CLE_CMD_HISTORY=y
+CONFIG_SYSTEM_NSH=y
+CONFIG_SYSTEM_NSH_STACKSIZE=2048
+CONFIG_SYSTEM_TIME64=y
+CONFIG_TASK_NAME_SIZE=20
+CONFIG_TESTING_CXXTEST=y
+CONFIG_TESTING_GETPRIME=y
+CONFIG_TESTING_GETPRIME_STACKSIZE=2048
+CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_FPUSTACKSIZE=2048
+CONFIG_TLS_NELEM=4
+CONFIG_UART1_SERIAL_CONSOLE=y
diff --git a/boards/risc-v/mpfs/icicle/scripts/ld.script b/boards/risc-v/mpfs/icicle/scripts/ld.script
index 85dc8503f8..503371e3a7 100644
--- a/boards/risc-v/mpfs/icicle/scripts/ld.script
+++ b/boards/risc-v/mpfs/icicle/scripts/ld.script
@@ -18,6 +18,8 @@
*
****************************************************************************/
+#include <nuttx/config.h>
+
MEMORY
{
progmem (rx) : ORIGIN = 0x80000000, LENGTH = 2M /* w/ cache */
@@ -45,11 +47,26 @@ SECTIONS
*(.glue_7)
*(.glue_7t)
*(.got)
- *(.gcc_except_table)
*(.gnu.linkonce.r.*)
_etext = ABSOLUTE(.);
} > progmem
+#if defined(CONFIG_HAVE_CXX)
+ .gcc_except_table : ALIGN(4) {
+ *(.gcc_except_table)
+ *(.gcc_except_table.*)
+ } > progmem
+
+#if defined(CONFIG_CXX_EXCEPTION)
+ .eh_frame_hdr : ALIGN(4) {
+ *(.eh_frame_hdr)
+ } > progmem
+ .eh_frame : ALIGN(4) {
+ KEEP(*(.eh_frame))
+ } > progmem
+#endif
+#endif
+
.init_section : ALIGN(4) {
_sinit = ABSOLUTE(.);
KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
@sastel Many thanks your patch. I will try based on your changes.
Summary
On the PolarFire Icicle board, when building from
master
with a config based onicicle:nsh
(plus config and flag changes detailed below), the cxxtest built-in app exception test fails. It throws the exception and then exits without catching the exception.I've tried this on both the Icicle board and a PolarFire Renode simulation. I'm building with
riscv-none-elf-*
from xPack:gcc version 13.2.0 (xPack GNU RISC-V Embedded GCC x86_64)
.I'm hoping someone knows what the issue might be as I'm not familiar with the behind the scenes of exception handling. Meanwhile I'll keep working on this.
Config
Changed optimization to
-Og
for debugging and set-mstrict-align
to prevent a crash due to misaligned instructions.Ran
make savedefconfig
and the following. The task size configs are set because I had to increaseCONFIG_DEFAULT_TASK_STACKSIZE
to prevent a stack overflow during exception unwinding.