blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.
GNU General Public License v3.0
3.29k stars 774 forks source link

stepping on empty loop #1883

Open lzace817 opened 3 months ago

lzace817 commented 3 months ago

whenever I step on a empty loop, gdb hangs and Ctrl-c stop working until I reset the board.

Steps to reproduce

I have this logs from a rust test, but the effective setup is pretty much the same.

$ ./blackmagic -v 1
Black Magic Debug App (for BMP only) v1.10.2
Using:
 _v1.10.2 BlackPill-F401CC 306D35993235
Listening on TCP port: 2000
Got connection
Speed set to 7.000MHz for SWD
Switching out of dormant state into SWD
DP DPIDR 0x1ba01477 (v1 rev0) designer 0x43b partno 0xba
AP   0: IDR=14770011 CFG=00000000 BASE=e00ff003 CSW=a3000040 (AHB3-AP var1 rev1)
Halt via DHCSR(01030003): success after 14ms
ROM: Table BASE=0xe00ff000 SYSMEM=0x00000001, Manufacturer 020 Partno 410
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x00000004001bb000 DEVTYPE = 0x00 ARCHID = 0x0000)
-> cortexm_probe
CPUID 0x411fc231 (M3 var 1 rev 1)
 1 0xe0001000: 0x00000000 <- does not match preamble (0xb105000d)
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x00000004000bb003 DEVTYPE = 0x00 ARCHID = 0x0000)
 3 0xe0000000: 0x00000000 <- does not match preamble (0xb105000d)
 4 0xe0040000: 0x00000000 <- does not match preamble (0xb105000d)
5 Entry 0xfff42002 -> Not present
ROM: Table END
syscall     SYS_OPEN (800172c 4 3 800172c)
syscall    SYS_WRITE (3 8001670 e 3)
Hello, world!
$ arm-none-eabi-gdb -q hello.elf
...
(gdb) n
19      loop {}
(gdb) n
^C^C^C^C^C^C^C^C
cortex_m_rt::Reset () at src/lib.rs:497
497 pub unsafe extern "C" fn Reset() -> ! {
(gdb) 
518     __pre_init();
(gdb) 

Expected behavior

the second next should block and run the loop, but Ctrl-c should be able to interrupt.

Breakpoint 1, main () at dbg-loop.c:4
4   {
(gdb) n
5       printf("Hello, World!\n");
(gdb) n
Hello, World!
6       while(1){
(gdb) n
^C
Program received signal SIGINT, Interrupt.
main () at dbg-loop.c:6
6       while(1){
(gdb) 

Notes

dragonmux commented 3 months ago

Noting that you're using v1.10.2 which has some known issues with semihosting, please can you give latest main a try and let us know if things are still unhappy? You can download a nightly build to hit the ground running as this does not require a probe firmware change to test.

lzace817 commented 3 months ago

I wasn't able to compile both working firmware and BMDA on my setup.

dragonmux commented 3 months ago

As said, you do not need to change firmware to test this. You only need BMDA. Working firmware, however, can be built with ARM GCC 12.2.Rel1, and a build of the firmware for every supported probe is available from the nightly build downloads too.

Edit: Please also let us know what part you're trying to debug, as it's possible there's part-specific issue in play, so knowing what you're trying to poke is going to potentially be quite important.

lzace817 commented 3 months ago

udev is required? Can't find the probe, trying with -d.

lzace817 commented 3 months ago

tests using the nightly build BMDA didin't change the firmware

target: stm32f103c8b6 blue pill (probably fake)

$ ./blackmagic-bmda -d /dev/ttyACM0 -v 1
Black Magic Debug App 763e057
 for Black Magic Probe, ST-Link v2 and v3, CMSIS-DAP, J-Link and FTDI (MPSSE)
Setting V6ONLY to off for dual stack listening.
Listening on TCP port: 2000
Got connection
Speed set to 7.000MHz for SWD
Switching out of dormant state into SWD
DP DPIDR 0x1ba01477 (v1 rev1) designer 0x43b partno 0xba
AP   0: IDR=14770011 CFG=00000000 BASE=e00ff000 CSW=e3000040 (AHB3-AP var1 rev1)
Halt via DHCSR(01030003): success after 13ms
ROM: Table BASE=0xe00ff000 SYSMEM=1, Manufacturer 020 Partno 410 (PIDR = 0x00000000000a0410)
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x00000004001bb000 DEVTYPE = 0x00 ARCHID = 0x0000)
-> cortexm_probe
CPUID 0x411fc231 (M3 var 1 rev 1)
 1 0xe0001000: 0x00000000 <- does not match preamble (0xb105000d)
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x00000004000bb003 DEVTYPE = 0x00 ARCHID = 0x0000)
 3 0xe0000000: 0x00000000 <- does not match preamble (0xb105000d)
 4 0xe0040000: 0x00000000 <- does not match preamble (0xb105000d)
5 Entry 0xfff42002 -> Not present
ROM: Table END
syscall     SYS_OPEN (800172c 4 3 800172c)
syscall    SYS_WRITE (2 8001670 e 2)
$ arm-none-eabi-gdb -q hello.elf
...
13      hprintln!("Hello, world!").unwrap();
(gdb) n
Hello, world!
19      loop {}
(gdb) n
^C^C^C^C^C^C^C^C^C^C^C

^C^C^C^C^C^C^C^C       # PRESSED RESET BUTTON HERE
0xfffece3c in ?? ()
(gdb) 
Cannot find bounds of current function
(gdb) bt
#0  0xfffece3c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
dragonmux commented 3 months ago

udev is required? Can't find the probe, trying with -d.

Yes, it always has been for full BMDA builds (ie, non-BMP-only), which are also the only supported configuration for Meson-based builds. Full BMDA uses libusb to find probes and their information (which is also much more reliable than the method BMP-only builds use). Doing this is what makes it require the rules, but then those rules also properly set up the /dev nodes and give you /dev/ttyBmpGdb and /dev/ttyBmpTarg which are stable, where /dev/ttyACM* are not as they're affected by enumeration order and other devices on the system.

target: stm32f103c8b6 blue pill (probably fake)

If you can share the result of mon swd/mon jtag (ie, a SWD/JTAG scan), BMD will tell you if the device it's found appears to be genuine or a clone.

dragonmux commented 3 months ago

Just been reviewing the initial post some more and something catches our eye that makes us question the accuracy of the C code translation:

$ arm-none-eabi-gdb -q hello.elf
...
(gdb) n
19      loop {}
(gdb) n
^C^C^C^C^C^C^C^C
cortex_m_rt::Reset () at src/lib.rs:497
497 pub unsafe extern "C" fn Reset() -> ! {
(gdb) 
518     __pre_init();
(gdb) 

This appears to be Rust code being compiled and run? It is possible that loop {} does not compile to while (true) {} but rather a WFI/WFE instruction. Please test our WDT handling branch (#1882) as this fixes the handling of these instructions on the STM32F1 family, among other parts. Either that, or provide a disassembly of that sequence showing what the compiler is lowering the code to, please.

It would also be useful to investigate why you land back at your reset handler, which indicates the CPU might be taking an exception that results in it rebooting. This is something that can be explored with backtraces (bt) and core register dumps (info reg). However, this is something to explore only after checking that you're not getting punked by a WFI/WFE instruction making BMD loose the target.

We would suggest running BMDA with -v 5 instead of -v 1 to get better logging as this will enable target-level debugging output as well as info-level.

lzace817 commented 3 months ago

rust was not relevant. semihosting was not relevant. the board reset because after the lockup, I pressed the reset button on the PCB. There is a coment on the gdb log indicating this moment.

sample code

void main(void)
{
    volatile int x;
    x = 0; # set breakpoint here
    while(1) {}
}

server

blackmagic-bmda -d /dev/ttyACM0 -v 5
Black Magic Debug App 763e057
 for Black Magic Probe, ST-Link v2 and v3, CMSIS-DAP, J-Link and FTDI (MPSSE)
Setting V6ONLY to off for dual stack listening.
Listening on TCP port: 2000
Got connection
Speed set to 7.000MHz for SWD
Switching out of dormant state into SWD
DP DPIDR 0x1ba01477 (v1 rev1) designer 0x43b partno 0xba
AP   0: IDR=14770011 CFG=00000000 BASE=e00ff000 CSW=e3000040 (AHB3-AP var1 rev1)
Halt via DHCSR(01030003): success after 12ms
ROM: Table BASE=0xe00ff000 SYSMEM=1, Manufacturer 020 Partno 410 (PIDR = 0x00000000000a0410)
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x00000004001bb000 DEVTYPE = 0x00 ARCHID = 0x0000)
-> cortexm_probe
CPUID 0x411fc231 (M3 var 1 rev 1)
Calling stm32f1_probe
 1 0xe0001000: 0x00000000 <- does not match preamble (0xb105000d)
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x00000004000bb003 DEVTYPE = 0x00 ARCHID = 0x0000)
 3 0xe0000000: 0x00000000 <- does not match preamble (0xb105000d)
 4 0xe0040000: 0x00000000 <- does not match preamble (0xb105000d)
5 Entry 0xfff42002 -> Not present
ROM: Table END
stm32f1_flash_erase: at 08000000
stm32f1_flash_write: at 08000000 for 1024 bytes
Speed set to 7.000MHz for SWD
Switching out of dormant state into SWD
DP DPIDR 0x1ba01477 (v1 rev1) designer 0x43b partno 0xba
AP   0: IDR=14770011 CFG=00000000 BASE=e00ff000 CSW=e3000040 (AHB3-AP var1 rev1)
Halt via DHCSR(00030003): success after 18ms
ROM: Table BASE=0xe00ff000 SYSMEM=1, Manufacturer 020 Partno 410 (PIDR = 0x00000000000a0410)
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x00000004001bb000 DEVTYPE = 0x00 ARCHID = 0x0000)
-> cortexm_probe
CPUID 0x411fc231 (M3 var 1 rev 1)
Calling stm32f1_probe
 1 0xe0001000: 0x00000000 <- does not match preamble (0xb105000d)
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x00000004000bb003 DEVTYPE = 0x00 ARCHID = 0x0000)
 3 0xe0000000: 0x00000000 <- does not match preamble (0xb105000d)
 4 0xe0040000: 0x00000000 <- does not match preamble (0xb105000d)
5 Entry 0xfff42002 -> Not present
ROM: Table END

GDB

$ ./empty-loop.sh 
Reading symbols from empty-loop.elf...
Remote debugging using :2000
Target voltage: 
Available Targets:
No. Att Driver
 1      STM32F1 medium density M3
Attaching to program: xxxxxxxxxxxxxxxxxxxxxxxxx/empty-loop.elf, Remote target
0x0800015c in main () at empty-loop.c:5
warning: Source file is more recent than executable.
5       while(1) {}
Loading section .text, size 0x1f8 lma 0x8000000
Start address 0x08000164, load size 504
Transfer rate: 3 KB/sec, 504 bytes/write.
Breakpoint 1 at 0x8000156: file empty-loop.c, line 4.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) c
Continuing.

Breakpoint 1, main () at empty-loop.c:4
4       x = 0;
(gdb) n
5       while(1) {}
(gdb) n
^C^C^C^C^C^C^C
^C^C^C^C^C^C                      # reset button was presset here
0x00000000 in ?? ()
(gdb) 
Cannot find bounds of current function
(gdb) mon swdp
Target voltage: 
You are now detached from the previous target.
Available Targets:
No. Att Driver
 1      STM32F1 medium density M3
(gdb) 

disassembly

$ arm-none-eabi-objdump -d empty-loop.elf 

empty-loop.elf:     file format elf32-littlearm

Disassembly of section .text:

08000000 <vector_table>:
 8000000:   00 50 00 20 65 01 00 08 61 01 00 08 5f 01 00 08     .P. e...a..._...
 8000010:   5f 01 00 08 5f 01 00 08 5f 01 00 08 00 00 00 00     _..._..._.......
    ...
 800002c:   61 01 00 08 61 01 00 08 00 00 00 00 61 01 00 08     a...a.......a...
 800003c:   61 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     a..._..._..._...
 800004c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800005c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800006c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800007c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800008c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800009c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000ac:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000bc:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000cc:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000dc:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000ec:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 80000fc:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800010c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800011c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800012c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800013c:   5f 01 00 08 5f 01 00 08 5f 01 00 08 5f 01 00 08     _..._..._..._...
 800014c:   5f 01 00 08                                         _...

08000150 <main>:
 8000150:   b480        push    {r7}
 8000152:   b083        sub sp, #12
 8000154:   af00        add r7, sp, #0
 8000156:   2300        movs    r3, #0
 8000158:   607b        str r3, [r7, #4]
 800015a:   bf00        nop
 800015c:   e7fd        b.n 800015a <main+0xa>

0800015e <blocking_handler>:
 800015e:   e7fe        b.n 800015e <blocking_handler>

08000160 <null_handler>:
 8000160:   4770        bx  lr
    ...

08000164 <reset_handler>:
 8000164:   b538        push    {r3, r4, r5, lr}
 8000166:   4a1a        ldr r2, [pc, #104]  @ (80001d0 <reset_handler+0x6c>)
 8000168:   4b1a        ldr r3, [pc, #104]  @ (80001d4 <reset_handler+0x70>)
 800016a:   491b        ldr r1, [pc, #108]  @ (80001d8 <reset_handler+0x74>)
 800016c:   428b        cmp r3, r1
 800016e:   d31a        bcc.n   80001a6 <reset_handler+0x42>
 8000170:   2100        movs    r1, #0
 8000172:   4a1a        ldr r2, [pc, #104]  @ (80001dc <reset_handler+0x78>)
 8000174:   4293        cmp r3, r2
 8000176:   d31b        bcc.n   80001b0 <reset_handler+0x4c>
 8000178:   f04f 22e0   mov.w   r2, #3758153728 @ 0xe000e000
 800017c:   f8d2 3d14   ldr.w   r3, [r2, #3348] @ 0xd14
 8000180:   4c17        ldr r4, [pc, #92]   @ (80001e0 <reset_handler+0x7c>)
 8000182:   f443 7300   orr.w   r3, r3, #512    @ 0x200
 8000186:   4d17        ldr r5, [pc, #92]   @ (80001e4 <reset_handler+0x80>)
 8000188:   f8c2 3d14   str.w   r3, [r2, #3348] @ 0xd14
 800018c:   42ac        cmp r4, r5
 800018e:   d312        bcc.n   80001b6 <reset_handler+0x52>
 8000190:   4c15        ldr r4, [pc, #84]   @ (80001e8 <reset_handler+0x84>)
 8000192:   4d16        ldr r5, [pc, #88]   @ (80001ec <reset_handler+0x88>)
 8000194:   42ac        cmp r4, r5
 8000196:   d312        bcc.n   80001be <reset_handler+0x5a>
 8000198:   f7ff ffda   bl  8000150 <main>
 800019c:   4c14        ldr r4, [pc, #80]   @ (80001f0 <reset_handler+0x8c>)
 800019e:   4d15        ldr r5, [pc, #84]   @ (80001f4 <reset_handler+0x90>)
 80001a0:   42ac        cmp r4, r5
 80001a2:   d310        bcc.n   80001c6 <reset_handler+0x62>
 80001a4:   bd38        pop {r3, r4, r5, pc}
 80001a6:   f852 0b04   ldr.w   r0, [r2], #4
 80001aa:   f843 0b04   str.w   r0, [r3], #4
 80001ae:   e7dd        b.n 800016c <reset_handler+0x8>
 80001b0:   f843 1b04   str.w   r1, [r3], #4
 80001b4:   e7de        b.n 8000174 <reset_handler+0x10>
 80001b6:   f854 3b04   ldr.w   r3, [r4], #4
 80001ba:   4798        blx r3
 80001bc:   e7e6        b.n 800018c <reset_handler+0x28>
 80001be:   f854 3b04   ldr.w   r3, [r4], #4
 80001c2:   4798        blx r3
 80001c4:   e7e6        b.n 8000194 <reset_handler+0x30>
 80001c6:   f854 3b04   ldr.w   r3, [r4], #4
 80001ca:   4798        blx r3
 80001cc:   e7e8        b.n 80001a0 <reset_handler+0x3c>
 80001ce:   bf00        nop
 80001d0:   080001f8    .word   0x080001f8
 80001d4:   20000000    .word   0x20000000
 80001d8:   20000000    .word   0x20000000
 80001dc:   20000000    .word   0x20000000
 80001e0:   080001f8    .word   0x080001f8
 80001e4:   080001f8    .word   0x080001f8
 80001e8:   080001f8    .word   0x080001f8
 80001ec:   080001f8    .word   0x080001f8
 80001f0:   080001f8    .word   0x080001f8
 80001f4:   080001f8    .word   0x080001f8
dragonmux commented 3 months ago

Thank you for the extra information, we'll do our best to replicate it here on a spare STM32F103 and get back to you now we're able to be sure it's definitely not a WFI/WFE issue or something else like that.

We can also note from the logs that, unless it's a clone that is not detected properly, the STM32 you've got there is genuine. The BMD scan output would say "(clone)" or the specific kind of clone device such as "GD32F1" or similar if it was a detected clone.

mean00 commented 3 months ago

My 0.02 euro$ This is a gotcha due to how gdb interpret the 'n' command i.e. not a bmp issue. I've ran into similar issues with other debuggers. When you do next, gdb does step step step until it reaches the "next line" whatever/wherever it is In the case of an empty loop, that next line is never reached, because you keep executing the same instruction at the same address, so gdb keeps on stepping forever ( the underlying asm code is something along "b *" )

There are 2 ways to get out of this :

There is a ticket there : https://bugs.launchpad.net/gcc-arm-embedded/+bug/1401565 using a jlink debugger Same cause, same consequence

dragonmux commented 3 months ago

Thank you for the reminder to get back to this - and sorry it's been a while since the last reply. We did some more digging and mean00 is indeed correct that this is a compiler and GDB limitation.

Specifically, because the compiler is generating an empty loop, it is unable to generate a debug line entry for the loop body to have the debugger break on. GDB is then unaware of there being any instruction within the loop to breakpoint, and so calculates the next instruction for the breakpoint as the instruction after the loop.

Adding to that, it then causes GDB to exercise a bug whereby because of the lack of line information within the loop, it gets itself stuck waiting for the target to move to any other instruction than the branch-to-self (which it's considering the start of the loop) to consider the target halted again - resulting in the behaviour you see. Unfortunately, in this instance, BMD is doing exactly as GDB instructs it, and the bug is in GDB and is caused by a compiler limitation.