blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.
GNU General Public License v3.0
3.28k stars 773 forks source link

BMP hangs if flashing "connect under reset" MCU #1333

Closed Azq2 closed 10 months ago

Azq2 commented 1 year ago

My MCU in deep sleep and flashing only with monitor connect_rst enable On fresh firmware 567b670 now hangs after one flashing. :(

BMP hardware: bluepill

bmp.scr

monitor debug_bmp enable
monitor connect_rst enable
monitor swdp_scan
attach 1
load
kill

First flashing (success)

INSTALL /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-13-g567b670c-dirty_C0D5DBD2-if00 stm32f0-pmic-with-rtc.elf (flash)
arm-none-eabi-gdb -nx --batch \
    -ex 'target extended-remote /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-13-g567b670c-dirty_C0D5DBD2-if00' \
    -x bmp.scr \
    stm32f0-pmic-with-rtc.elf
Assert nRST during connect: enabled
Target voltage: 0.00V
Available Targets:
No. Att Driver
 1      STM32F03 M0
reset_handler () at lib/libopencm3/lib/stm32/f0/../../cm3/vector.c:67
67      for (src = &_data_loadaddr, dest = &_data;
Loading section .text, size 0x1eec lma 0x8000000
Loading section .init_array, size 0x4 lma 0x8001eec
Loading section .ARM.exidx, size 0x8 lma 0x8001ef0
Start address 0x080015a4, load size 7928
Transfer rate: 15 KB/sec, 720 bytes/write.
Kill the program being debugged? (y or n) [answered Y; input not from terminal]
[Inferior 1 (Remote target) killed]

Debug:

scan_multidrop: false
DP DPIDR 0x0bb11477 (v1 MINDP rev0) designer 0x43b partno 0xbb
RESET_SEQ failed
AP   0: IDR=04770021 CFG=00000000 BASE=e00ff003 CSW=03000040 (AHB-AP var2 rev0)
Halt via DHCSR: success 02000003 after 0ms
ROM: Table BASE=0xe00ff000 SYSMEM=0x00000001, Manufacturer  20 Partno 440
0 0xe000e000: Generic IP component - Cortex-M0 SCS (System Control Space) (PIDR = 0x00000004000bb008  DEVTYPE = 0x00 ARCHID = 0x0000)
-> cortexm_probe
CPUID 0x410cc200 (M0 var 0 rev 0)
1 0xe0001000: Generic IP component - Cortex-M0 DWT (Data Watchpoint and Trace) (PIDR = 0x00000004000bb00a  DEVTYPE = 0x00 ARCHID = 0x0000)
2 0xe0002000: Generic IP component - Cortex-M0 BPU (Breakpoint Unit) (PIDR = 0x00000004000bb00b  DEVTYPE = 0x00 ARCHID = 0x0000)
ROM: Table END

Second flashing (not success, bmp not responding to BMP)

INSTALL /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-13-g567b670c-dirty_C0D5DBD2-if00 stm32f0-pmic-with-rtc.elf (flash)
arm-none-eabi-gdb -nx --batch \
    -ex 'target extended-remote /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-13-g567b670c-dirty_C0D5DBD2-if00' \
    -x bmp.scr \
    stm32f0-pmic-with-rtc.elf
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Malformed response to offset query, timeout
bmp.scr:1: Error in sourced command file:
"monitor" command not supported by this target.
make: *** [Makefile:60: install] Error 1

Debug:

SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
SWD access resulted in wait, aborting
Azq2 commented 1 year ago

~Hi random guys, who reading this issue. Do you know firmware version where works "connect under reset" flasing and BMP not hangs?~

Hmmm.... on v1.8.2 works fine.

dragonmux commented 1 year ago

Thanks for reporting this issue. What target is this (ie, is this really STM32F03?) and why are you explicitly forcing connect under reset? If it is an F03, the target should work without doing anything special such as connect-under-reset.

Additionally, please provide a non-truncated debug log for the erroring run. It would also be useful if you can compile and run BMDA via make PROBE_HOST=hosted HOSTED_BMP_ONLY=1 ENABLE_DEBUG=1 and use target extended :2000 instead of BMP as your GDB remote target having run the BMDA binary up as src/blackmagic -v 5 - this will provide a significantly more detailed log and help us understand what access is resulting in an indefinite SWD wait state.

Azq2 commented 1 year ago

Thanks for reply!

  1. Yes, target is original STM32F030F4P6TR
  2. I'm use "connect-under-reset" because my MCU in deep sleep. I'm use standby/stop mode for power saving in my device.

blackmagic.log

// I'm deleted my old comments, because I forget switch from v1.8.2 to main branch. Now I sent right logs.

dragonmux commented 1 year ago

Many thanks for the extra logs. We'll dig deeper and dive into what's going on. We suspect that both this and #1327 have the same underlying root cause which is memory exhaustion and possibly some leaked allocations causing heap fragmentation.

The bluepill has very limited SRAM availability, and we suspect https://github.com/blackmagic-debug/blackmagic/pull/1196 increased the heap usage and memory needs of the firmware past breaking point for bluepill. You would be welcome to experiment with reverting this PR on your local copy to confirm this hypothesis, however we will be looking the logs and at it under the assumption that the hypothesis is correct, and look at what we can do to rearchitect this part of the firmware to avoid making the large allocation or at least bound the lifetime of the allocation to just the one phase during attach where it gets used.

Azq2 commented 1 year ago

I found potentially problem with leaking.

First flashing

mydebug: cortexm_attach(0x557cc09d8e60), t->tdesc=(nil)
mydebug: create_tdesc_cortex_m, size=1047, t=0x557cc09d8e60
mydebug: cortexm_detach(0x557cc09d8e60)
mydebug: free(t->tdesc), t=0x557cc09d8e60

Looks good.

Second flashing

mydebug: cortexm_attach(0x557cc09d8e60), t->tdesc=(nil)
mydebug: create_tdesc_cortex_m, size=1047, t=0x557cc09d8e60

mydebug: cortexm_attach(0x557cc09d9c60), t->tdesc=(nil)
mydebug: create_tdesc_cortex_m, size=1047, t=0x557cc09d9c60
mydebug: cortexm_detach(0x557cc09d9c60)
mydebug: free(t->tdesc), t=0x557cc09d9c60
  1. cortexm_attach calling twice, without cortexm_detach before... that's okay?
  2. On second call "cortexm_attach" t->tdesc is null, and create_tdesc_cortex_m newly allocated. That's causes memory leak.

-v 5 + mydebug: mydebug.log


We suspect that both this and #1327 have the same underlying root cause which is memory exhaustion and possibly some leaked allocations causing heap fragmentation.

But why problem (only this issue with deep sleep) reproduces in hosted blackmagic? I think, hosted version have a lot of heap

Azq2 commented 1 year ago

I think, that happens because I always call monitor swdp_scan before flashing And previous attached target do not detached (bmp bug), causing memory leak

Azq2 commented 1 year ago

I fixed this memory leak in PR #1334 Issue in #1327 fixed, but problem with connect-under-reset still present

Azq2 commented 1 year ago

Also, I see potential memory leak when cortextm_atatch return false.

    if (!t->attach(t)) {
        platform_target_clk_output_enable(false);
        return NULL;
    }

    t->attached = true;

cortexm_detach() never called, because flag t->attached is false But t->tdesc always allocated

Two way for fixing this issue:

  1. Always calling t->detach(), even if t->attach() return false
  2. t->attach() must free all own resources when failed

I don't know which way appropriate for BlackMagic architecture.

dragonmux commented 1 year ago

Rather than going after t->tdesc like this, please see branch fix/target-register-description-memory-exhaustion. As we originally intimated in our hypothesis, limiting the scope and duration of the allocation is the goal here, not plugging a memory leak in this whack-a-mole way.

You should find this branch both solves the issue of BMP running out of heap while attached and also leaking target register description allocations.

As for BMDA, when using the remote protocol a significant portion of the firmware is still in use consuming heap allocations on your probe. This may be part of why the bug still reproduces in this scenario.

Azq2 commented 1 year ago

This problem still reproduces on main branch (when MCU in standby mode + nRST).

$ make all install
make: Nothing to be done for 'all'.
RAM: 432
ROM: 13208
INSTALL /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-87-g1135437c_C0D5DBD2-if00 stm32f0-pmic-with-rtc.elf (flash)
arm-none-eabi-gdb -nx --batch \
    -ex 'target extended-remote /dev/serial/by-id/usb-Black_Magic_Debug_Black_Magic_Probe__ST-Link_v2__v1.9.0-rc0-87-g1135437c_C0D5DBD2-if00' \
    -x bmp.scr \
    stm32f0-pmic-with-rtc.elf
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Malformed response to offset query, timeout
bmp.scr:2: Error in sourced command file:
"monitor" command not supported by this target.
make: *** [Makefile:67: install] Error 1
dragonmux commented 1 year ago

We haven't yet been able to figure out what's going on with this issue, which is why we made sure no PRs tagged it for resolution. We'll dive into this over the next couple of days once we've got another couple of issues out the way that are presently ahead of this one

dragonmux commented 1 year ago

So.. we've just spent the last several hours trying to reproduce this issue on the latest main (302c8b53 at time of writing) and we cannot get it to reproduce at all. Are you still having this issue?

tuna-f1sh commented 1 year ago

Just to add to the mix. I have this issue too with 1.9 BMP V2.3a, but nothing to do with 'connect under reset'. Running this flash script as a batch -nx --batch:

monitor swdp_scan
attach 1
load
compare-sections
kill

Works fine for first target but second will just hang. I have to unplug and replug the BMP. Adding monitor kill to end of script does fix it (I guess because it detaches cleanly?) but this isn't intended behaviour?

dragonmux commented 1 year ago

With our recent dives into how kill vs detach work, we have to ask: with what target are you reproducing this? It seems to be entirely dependant on the Flash mode exit routine being used and therefore the target as this turns out to be related to #1458.

tuna-f1sh commented 1 year ago

Sorry, I typed that whilst in the middle of job to capture it. I actually meant adding monitor reset fixes it; kill was always there. I’ve edited the above so it hopefully makes more sense!

And it’s a STM32F04/F070x6 target

dragonmux commented 10 months ago

Given the lack of responses and that we've done a lot of work to improve the behaviour of connect-under-reset and other aspects of the scan and the target Flash subsystems, we are going to assume this is fixed. If that is not the case please let us know and we will re-open the issue.