[BUG] Running lm3s6965-ek:qemu-protected with gdb-multiarch is crashing

acassis commented 1 month ago

Description / Steps to reproduce the issue

$ ./tools/configure.sh lm3s6965-ek:qemu-protected $ make -j qemu-system-arm -net nic,model=stellaris -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23,hostfwd=tcp:127.0.0.1:10021-10.0.2.15:21 -M lm3s6965evb -kernel nuttx -nographic -s -S Timer with period zero, disabling

Open a new terminal and run:

$ gdb-multiarch -i=mi nuttx =thread-group-added,id="i1" ~"GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1\n" ~"Copyright (C) 2022 Free Software Foundation, Inc.\n" ~"License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law." ~"\nType \"show copying\" and \"show warranty\" for details.\n" ~"This GDB was configured as \"x86_64-linux-gnu\".\n" ~"Type \"show configuration\" for configuration details.\n" ~"For bug reporting instructions, please see:\n" ~"https://www.gnu.org/software/gdb/bugs/.\n" ~"Find the GDB manual and other documentation resources online at:\n http://www.gnu.org/software/gdb/documentation/." ~"\n\n" ~"For help, type \"help\".\n" ~"Type \"apropos word\" to search for commands related to \"word\"...\n" ~"Reading symbols from nuttx...\n" (gdb) target extended-remote:1234 &"target extended-remote:1234\n" ~"Remote debugging using :1234\n" =thread-group-started,id="i1",pid="1" =thread-created,id="1",group-id="i1" ~"start () at chip/common/lmxx_tm4c_start.c:112\n" ~"112\t tiva_clock_configure();\n" *stopped,frame={addr="0x0000011c",func="start",args=[],file="chip/common/lmxx_tm4c_start.c",fullname="/home/alan/nuttxspace/nuttx/arch/arm/src/tiva/common/lmxx_tm4c_start.c",line="112",arch="armv7"},thread-id="1",stopped-threads="all" ^done (gdb) c &"c\n" ~"Continuing.\n" ^running *running,thread-id="all" (gdb) =thread-exited,id="1",group-id="i1" =thread-group-exited,id="i1" &"Remote connection closed\n" (gdb)

At this point the qemu will crash with this error message:

ABCDEF qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

R00=0000011d R01=00000003 R02=0001c6c4 R03=002045f8 R04=0000011d R05=000003e8 R06=00000000 R07=15f9a8f0 R08=200016ec R09=200016fc R10=00000000 R11=00000001 R12=00000000 R13=20003a30 R14=00004cbd R15=00004cf0 XPSR=41000003 -Z-- T handler FPSCR: 00000000 Aborted (core dumped)

On which OS does this issue occur?

[Linux]

What is the version of your OS?

Ubuntu 22.04.4 LTS

NuttX Version

master, latest commit: 0be6dfb552de

Issue Architecture

[arm]

Issue Area

[Debugging]

Verification

[X] I have verified before submitting the report.

acassis commented 1 month ago

@masayuki2009 @xiaoxiang781216 @yf13 please take a look

acassis commented 1 month ago

This is the first time I try to use it with gdb-multiarch, in the past I tested using arm-none-eabi-gdb all steps here: https://acassis.wordpress.com/2021/03/04/using-qemu-with-gdb-to-debug-nuttx/

yf13 commented 1 month ago

@acassis this is my first taste of qemu-system-arm with a NuttX arm port. Here I followed your QEMU launching method and got that error directly:

$ qemu-system-arm -M lm3s6965evb -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23 -net nic,model=stellaris -nographic -kernel nuttx
Timer with period zero, disabling
ABCDEF
qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

I am wondering how nuttx_user.elf get loaded to QEMU? Protected build has two ELFs and for riscv we use "-device loader,file=xxx" to load the other ELF. But when I added that option I got:

$ qemu-system-arm -M lm3s6965evb -net user,hostfwd=tcp:127.0.0.1:10023-10.0.2.15:23 -net nic,model=stellaris -nographic -kernel nuttx -device loader,file=nuttx_user.elf
qemu-system-arm: Some ROM regions are overlapping
...
The following two regions overlap (in the cpu-memory-0 address space):
  nuttx ELF program header segment 2 (addresses 0x000000000001e048 - 0x0000000000021984)
  nuttx_user.elf ELF program header segment 1 (addresses 0x0000000000020000 - 0x00000000000400e8)

maybe QEMU arm needs a different way to load multiple programs?

yf13 commented 1 month ago

@acassis, here I have some updates:

Use a memory.ld as in pull/12876, as the original fix overlooked the uflash origin adjustion.
Use combined binary file with QEMU arm, there might better approaches. The combined binary = padded(nuttx.bin) + nuttx_user.bin, here we pad nuttx.bin to 124K as per memory.ld

Then we can start the target like:

$ qemu-system-arm -M lm3s6965evb -nographic -no-reboot -device loader,file=combined -s -S
Timer with period zero, disabling   # below are NuttX output
ABCD

From another terminal we can start gdb-multiarch like

$ gdb-multiarch -ex "target remote :1234" -ex "add-symbol-file nuttx" -ex "add-symbol-file nuttx_user.elf"
...
(gdb) conti
Continuting.
^C
Program received signal SIGINT, Interrupt.
memset (s=0x68044798, c=c@entry=0, n=<optimized out>) at string/lib_memset.c:171
171   while (n-- > 0) *p++ = c;
(gdb) bt
#0  memset (s=0x68044798, c=c@entry=0, n=<optimized out>) at string/lib_memset.c:171
#1  0x000002b2 in tiva_userspace () at chip/common/tiva_userspace.c:80
#2  0x0000017a in __start () at chip/common/lmxx_tm4c_start.c:158
#3  0xfffffffe in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

So from GDB session, it seems that tiva_userspace() is having trouble?

yf13 commented 1 month ago

@acassis, seems that lm3s6965-ek implementation requires 128K kflash, as patch/12872 is in place, we can restore 128K kflash and run qemu-protected config like below:

$ qemu-system-arm -M lm3s6965evb -nographic -device loader,file=nuttx.bin,addr=0 -device loader,file=nuttx_user.bin,addr=0x20000

apache / nuttx