apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.74k stars 1.14k forks source link

PolarFire SoC - C++ Exception Exits Before Catch #11912

Open sastel opened 6 months ago

sastel commented 6 months ago

Summary

On the PolarFire Icicle board, when building from master with a config based on icicle:nsh (plus config and flag changes detailed below), the cxxtest built-in app exception test fails. It throws the exception and then exits without catching the exception.

I've tried this on both the Icicle board and a PolarFire Renode simulation. I'm building with riscv-none-elf-* from xPack: gcc version 13.2.0 (xPack GNU RISC-V Embedded GCC x86_64).

I'm hoping someone knows what the issue might be as I'm not familiar with the behind the scenes of exception handling. Meanwhile I'll keep working on this.

Config

Changed optimization to -Og for debugging and set -mstrict-align to prevent a crash due to misaligned instructions.

Ran make savedefconfig and the following. The task size configs are set because I had to increase CONFIG_DEFAULT_TASK_STACKSIZE to prevent a stack overflow during exception unwinding.

$ diff boards/risc-v/mpfs/icicle/configs/nsh/defconfig  defconfig
9a10,12
> # CONFIG_NSH_DISABLE_MB is not set
> # CONFIG_NSH_DISABLE_MH is not set
> # CONFIG_NSH_DISABLE_MW is not set
21a25,26
> CONFIG_CXX_EXCEPTION=y
> CONFIG_CXX_RTTI=y
25a31
> CONFIG_DEFAULT_TASK_STACKSIZE=32768
30a37
> CONFIG_HAVE_CXX=y
34a42,43
> CONFIG_LIBCXX=y
> CONFIG_LIBCXXABI=y
37a47
> CONFIG_LIBM=y
50a61
> CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
51a63
> CONFIG_PTHREAD_STACK_DEFAULT=2048
58a71
> CONFIG_SCHED_HPWORKSTACKSIZE=2048
59a73
> CONFIG_SCHED_LPWORKSTACKSIZE=2048
66a81
> CONFIG_SYSTEM_NSH_STACKSIZE=2048
68a84
> CONFIG_TESTING_CXXTEST=y
69a86
> CONFIG_TESTING_GETPRIME_STACKSIZE=2048
70a88,89
> CONFIG_TESTING_OSTEST_FPUSTACKSIZE=2048
> CONFIG_TLS_NELEM=4
sastel commented 6 months ago

Scripts for Debugging with GDB and Renode

Here is what I've been using to debug this:

These scripts go in nuttxspace/nuttx.

$ cat gdb.sc
target remote :3333

break cxa_exception.cpp:279
break exit.c:55
$ cat polarfire-nuttx.resc
:name: PolarFire NuttX
:description: This is a sample script running NuttX on Icicle Kit with PolarFire SoC

emulation SetGlobalSerialExecution true
emulation SetGlobalQuantum "0.0008"

using sysbus
mach create

machine LoadPlatformDescription @platforms/cpus/polarfire-soc.repl
showAnalyzer mmuart1

macro reset
"""
    sysbus LoadELF @./nuttx
"""
runMacro $reset

e51 IsHalted true
#u54_1 IsHalted true
u54_2 IsHalted true
u54_3 IsHalted true
u54_4 IsHalted true

machine StartGdbServer 3333

Backtraces

image

acassis commented 6 months ago

@jrosberg did you face similar issue with using C++ on PolarFire?

sastel commented 6 months ago

@acassis @jrosberg the eh_frame and eh_frame_hdr sections are missing from the linker script boards/risc-v/mpfs/icicle/scripts/ld.script. That's part of the problem, though there seems to be other things missing because I was not able to fix it by adding those sections.

acassis commented 6 months ago

@sastel maybe you can look some other board with C++ support to trace the differences, probably there some other details. Also if you could try in some ARM (like STM32F4Discovery board) it could help to see if everything still working as expected.

acassis commented 6 months ago

Nice @jrosberg is not around, maybe @pussuw could help us! Ville, do you have some idea?

pussuw commented 6 months ago

I don't use exceptions so no idea what's going on

acassis commented 6 months ago

Thanks @pussuw

sastel commented 6 months ago

@acassis I tried STM32F4 Discovery NSH build from 12.4 in Renode today, but it crashed on startup. I don't have the physical board.

I cloned gcc and compared the libgcc source with the asm from GDB as I stepped through the exception handling. It is clear that __cxa_throw calls _Unwind_RaiseException from gcc/libgcc/unwind.inc. That calls uw_init_context_1 from unwind-dw2.c which is calling abort() because it can't find an FDE (the call to uw_frame_state_for returns _URC_END_OF_STACK after _Unwind_Find_FDE returns NULL. I suspect that the FDE's are not being initialized correctly. I have to do some reading to understand how this is supposed to work.

Here's a great resource: https://maskray.me/blog/2020-12-12-c++-exception-handling-abi

acassis commented 6 months ago

Hi @sastel that is very interesting investigation. If you can, please write down about it in some blog, probably it will help more people in the future.

sastel commented 6 months ago

I made some progress on this today. _Unwind_Find_FDE was returning NULL because the unseen_objects linked list wasn't initialized. This linked list needs to be initialized at startup, and then during exception handling it has nodes added and moved into seen_objects. The list is used for finding an FDE by program counter.

Looks like users of libgcc can either link with crtbegin.o or write their own calls to libgcc __register_frame_info* function. The crtstuff.c file provides a frame_dummy() function which is supposed to be placed in the .init_array section by the linker, so that it gets invoked by the initialization logic pre-main. This frame_dummy function does the registration using __EH_FRAME_BEGIN__. The function was getting called by NuttX lib_cxx_initialize() but, __EH_FRAME_BEGIN__ was NULL so initialization didn't happen.

I hacked up the Toolchain.defs file to link crtbegin.o and crtend.o so that I get frame_dummy. With a new .eh_frame section in the linker script, __EH_FRAME_BEGIN__ points to the beginning of .eh_frame and initialization works.

image

image

The next problem is that during libgcc FDE searching, I get a RISC-V load address misaligned crash. I'm not sure what to do about this yet.

acassis commented 6 months ago

@pussuw do you have some suggestion how it could be integrated back into mainline? Maybe we can use CONFIG_CXX_EXCEPTION to decide if we want to include crtbegin.o and crtend.o right?

sastel commented 6 months ago

@acassis to be clear, exceptions still aren't being caught - the load address misalignment crash is happening during the _Unwind_Find_FDE call from the throw.

I think we should use CONFIG_CXX_EXCEPTION, but whether we want to link with crtbegin.o or crtend.o versus extern void __register_frame(void* begin) and call it in our own init function is another question. Including .eh_frame also uses a bunch of extra memory, so we won't want to include it unless it is used for CXX exceptions.

sastel commented 6 months ago

@acassis one of my colleagues built the gcc toolchain (including libgcc.a) for RISC-V with -mstrict-align. That fixed the crash. Now there's another problem! 🤣

libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error

I'll chase this one down next.

acassis commented 6 months ago

Hi @sastel nice to know you guys found a solution! And too sad you are facing a new issue. This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.

sastel commented 6 months ago

This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.

It's a good idea. I'll have to solve all of the issues before I'm remotely qualified to write the blog post though. 😄

sastel commented 6 months ago

libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error

I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.

The exception handling stack usage is quite high (I needed to set the stack size for the task which throws the exception to 9 kB), but at least exceptions work. I'm investigating what is using all of that stack. It appears to be mostly consumed by the libgcc exception stack unwinding functions.

A big thank you to my colleague @sholford for his support on this!

acassis commented 6 months ago

Congratulations @sastel, it was a quick journey and it succeed! Kudos!!!

Now you are qualified to write a blog exampling to others how to fix this issue!

I like to document all the steps when I'm fixing an issue, because I know it will help others.

For instance, a post I created almost two years ago: https://acassis.wordpress.com/2022/10/31/trying-to-fix-a-mqtt-communication-issue-with-nuttx-and-tagoio/ helped a friend of mine today. Yes was trying to use Wireshark to work as MITM with his embedded board.

I hope you remember all the steps you took! ;-)

royfengsss commented 3 months ago

Hi @sastel.

I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.

Could you kindly share your pull request on fixing this issue? I am facing a similar exception issue on ESP32s3. I hope your solution on icicle board could be a good reference of fixing on ESP board.

Thanks a lot.

sastel commented 3 months ago

@royfengsss this patch summarizes it.

diff --git a/arch/risc-v/src/common/Toolchain.defs b/arch/risc-v/src/common/Toolchain.defs
index 3d0e5296a9..e37598e2c4 100644
--- a/arch/risc-v/src/common/Toolchain.defs
+++ b/arch/risc-v/src/common/Toolchain.defs
@@ -285,6 +285,10 @@ endif
 # Add the builtin library

 EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-libgcc-file-name))
+ifeq ($(CONFIG_CXX_EXCEPTION),y)
+  EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtbegin.o))
+  EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtend.o))
+endif

 ifeq ($(CONFIG_LIBM_TOOLCHAIN),y)
   EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=libm.a))
diff --git a/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
new file mode 100644
index 0000000000..f536046464
--- /dev/null
+++ b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
@@ -0,0 +1,90 @@
+#
+# This file is autogenerated: PLEASE DO NOT EDIT IT.
+#
+# You can use "make menuconfig" to make any modifications to the installed .config file.
+# You can then do "make savedefconfig" to generate a new defconfig file that includes your
+# modifications.
+#
+# CONFIG_DISABLE_OS_API is not set
+# CONFIG_NSH_DISABLE_LOSMART is not set
+# CONFIG_NSH_DISABLE_MB is not set
+# CONFIG_NSH_DISABLE_MH is not set
+# CONFIG_NSH_DISABLE_MW is not set
+CONFIG_ARCH="risc-v"
+CONFIG_ARCH_BOARD="icicle"
+CONFIG_ARCH_BOARD_COMMON=y
+CONFIG_ARCH_BOARD_ICICLE_MPFS=y
+CONFIG_ARCH_CHIP="mpfs"
+CONFIG_ARCH_CHIP_MPFS250T_FCVG484=y
+CONFIG_ARCH_CHIP_MPFS=y
+CONFIG_ARCH_INTERRUPTSTACK=2048
+CONFIG_ARCH_RISCV=y
+CONFIG_ARCH_STACKDUMP=y
+CONFIG_BOARD_LOOPSPERMSEC=54000
+CONFIG_BUILTIN=y
+CONFIG_CXX_EXCEPTION=y
+CONFIG_CXX_RTTI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
+CONFIG_DEBUG_FULLOPT=y
+CONFIG_DEBUG_SYMBOLS=y
+CONFIG_DEFAULT_TASK_STACKSIZE=16384
+CONFIG_DEV_ZERO=y
+CONFIG_EXPERIMENTAL=y
+CONFIG_FS_PROCFS=y
+CONFIG_FS_ROMFS=y
+CONFIG_FS_SHMFS=y
+CONFIG_HAVE_CXX=y
+CONFIG_IDLETHREAD_STACKSIZE=2048
+CONFIG_INIT_ENTRYPOINT="nsh_main"
+CONFIG_INIT_STACKSIZE=3072
+CONFIG_INTELHEX_BINARY=y
+CONFIG_LIBCXX=y
+CONFIG_LIBCXXABI=y
+CONFIG_LIBC_HOSTNAME="icicle"
+CONFIG_LIBC_PERROR_STDOUT=y
+CONFIG_LIBC_STRERROR=y
+CONFIG_LIBM=y
+CONFIG_MEMSET_64BIT=y
+CONFIG_MEMSET_OPTSPEED=y
+CONFIG_MPFS_ENABLE_DPFPU=y
+CONFIG_MPFS_UART1=y
+CONFIG_NSH_ARCHINIT=y
+CONFIG_NSH_BUILTIN_APPS=y
+CONFIG_NSH_DISABLE_IFUPDOWN=y
+CONFIG_NSH_DISABLE_MKDIR=y
+CONFIG_NSH_DISABLE_RM=y
+CONFIG_NSH_DISABLE_RMDIR=y
+CONFIG_NSH_DISABLE_UMOUNT=y
+CONFIG_NSH_LINELEN=160
+CONFIG_NSH_STRERROR=y
+CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
+CONFIG_PREALLOC_TIMERS=4
+CONFIG_PTHREAD_STACK_DEFAULT=2048
+CONFIG_RAM_SIZE=1048576
+CONFIG_RAM_START=0x80200000
+CONFIG_RAW_BINARY=y
+CONFIG_READLINE_CMD_HISTORY=y
+CONFIG_READLINE_TABCOMPLETION=y
+CONFIG_RR_INTERVAL=200
+CONFIG_SCHED_HPWORK=y
+CONFIG_SCHED_HPWORKSTACKSIZE=2048
+CONFIG_SCHED_LPWORK=y
+CONFIG_SCHED_LPWORKSTACKSIZE=2048
+CONFIG_SCHED_WAITPID=y
+CONFIG_SERIAL_NPOLLWAITERS=2
+CONFIG_STACK_COLORATION=y
+CONFIG_START_MONTH=4
+CONFIG_START_YEAR=2021
+CONFIG_SYSTEM_CLE_CMD_HISTORY=y
+CONFIG_SYSTEM_NSH=y
+CONFIG_SYSTEM_NSH_STACKSIZE=2048
+CONFIG_SYSTEM_TIME64=y
+CONFIG_TASK_NAME_SIZE=20
+CONFIG_TESTING_CXXTEST=y
+CONFIG_TESTING_GETPRIME=y
+CONFIG_TESTING_GETPRIME_STACKSIZE=2048
+CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_FPUSTACKSIZE=2048
+CONFIG_TLS_NELEM=4
+CONFIG_UART1_SERIAL_CONSOLE=y
diff --git a/boards/risc-v/mpfs/icicle/scripts/ld.script b/boards/risc-v/mpfs/icicle/scripts/ld.script
index 85dc8503f8..503371e3a7 100644
--- a/boards/risc-v/mpfs/icicle/scripts/ld.script
+++ b/boards/risc-v/mpfs/icicle/scripts/ld.script
@@ -18,6 +18,8 @@
  *
  ****************************************************************************/

+#include <nuttx/config.h>
+
 MEMORY
 {
     progmem (rx) : ORIGIN = 0x80000000, LENGTH = 2M    /* w/ cache */
@@ -45,11 +47,26 @@ SECTIONS
         *(.glue_7)
         *(.glue_7t)
         *(.got)
-        *(.gcc_except_table)
         *(.gnu.linkonce.r.*)
         _etext = ABSOLUTE(.);
     } > progmem

+#if defined(CONFIG_HAVE_CXX)
+    .gcc_except_table : ALIGN(4) {
+        *(.gcc_except_table)
+        *(.gcc_except_table.*)
+    } > progmem
+
+#if defined(CONFIG_CXX_EXCEPTION)
+    .eh_frame_hdr : ALIGN(4) {
+        *(.eh_frame_hdr)
+    } > progmem
+    .eh_frame : ALIGN(4) {
+        KEEP(*(.eh_frame))
+    } > progmem
+#endif
+#endif
+
     .init_section : ALIGN(4) {
         _sinit = ABSOLUTE(.);
         KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
royfengsss commented 3 months ago

@sastel Many thanks your patch. I will try based on your changes.