blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.
GNU General Public License v3.0
3.19k stars 763 forks source link

Add support for STM32WLE5JC SOC (Cortex M4 + Lora radio) #832

Closed ofauchon closed 2 years ago

ofauchon commented 3 years ago

Hi all,

I'd like to use BMP for flashing/debugging new STM32WLE5 chips.

https://www.st.com/en/microcontrollers-microprocessors/stm32wle5jc.html

I have Seeed Studio Lora E5 module on a DIY breakboard, which exposes SWDIO/SWCLK/RST pins:

With BMP hosted ( blackmagic -t -v31), I could get traces with Read Timeout errors:

CPUID 0x410fc241 (M4 var 0 rev 1)
!HM000023000040e000ed8800000004#
Timeout on read RESP
remote_ap_mem_read error -4 around 0xe000ed88
ap_memread @ e000ed88 len 4: 05 56 00 00
ap_mem_write_sized @ e000ed88 len 4, align 4: 05 56 f0 00
!Hm00002300004002e000ed88000000040556f000#
Timeout on read RESP
remote_ap_mem_write_sized error -4 around address 0xe000ed8c
!HM000023000040e000ed8800000004#
Timeout on read RESP
remote_ap_mem_read error -4 around 0xe000ed88
ap_memread @ e000ed88 len 4: 05 56 00 00
!HM000023000040e000ed7c00000004#
Timeout on read RESP
remote_ap_mem_read error -4 around 0xe000ed7c
ap_memread @ e000ed7c len 4: 05 56 00 00
swdptap_seq_out         8 ticks: 0000008d

As CPACR (FPU access register) is located at address 0xE000ED88, I tried to remove the FP probe code in cortexm.c (Search for / Probe for FP extension / in the source)

It seems I can go further with this hack, but I still have errors:

Read  AP 0x01 CSW   : 0x0100001a
!Ha000001f4#
Timeout on read RESP
Read  AP 0x01 CFG   : 0x0100001a
AP   0: IDR=00370011 CFG=0100001a BASE=0100001a CSW=01000008
AP#0 IDR = 0x00370011 (AHB-AP var1 rev0)
!HM00000100000801000ff000000004#
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01000ff0
ap_memread @ 1000ff0 len 4: 00 00 00 00
!HM00000100000801000ff400000004#
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01000ff4
ap_memread @ 1000ff4 len 4: 00 00 00 00

Now I need help to verify if ADIv5 driver tries to read correct addresses, and try to understand why these reads ends up with Timeout errors.

At the same time, I'll continue the analysis this way: I'll record a working SWD sessions with openocd + Logic Analyzer SWD Decoder, and try to compare with faulty BMP transactions.

I don't have much knowledge about Arm Debug Interface, so any help is welcome .

Thanks

Olivier

UweBonnes commented 3 years ago

The errors happen while BMP tries to examine the M4 in the first AP. M4 has FPU and the access to the FPU register should succeed. So there must be other reasons why access fails. Errors in this early phase are not good handle with remote, if you can, try blackmagic hosted with an Stlink/St-firmware debugger connected to the STM32WL with similar verbosity. This probably will not work neither, but perhaps report errors better...

ofauchon commented 3 years ago

Thanks Uwe.

As you suggested, I ran blackmagic -t -v31 with a stlinkv2 clone. There are tons of errors at the beginning, but it seems to continue ...

Please find the log :

log_BMPHosted_STLinkV2Clone.log

Thanks for your help .

Olivier

UweBonnes commented 3 years ago

That looks as expected. Problems only happen after "Open AP 1" when BMP tries to decipher the second CPU M0.

ofauchon commented 3 years ago

Pay attention there are two STM32WL references:

According to the the order, my device is : LoRa-E5 STM32WLE5JC Module, embedded SX126X and MCU for LoRaWAN Wireless Sensor Network & IoT devices - EU868 & US915

=> So This should be the single Core M4 .... I'll double check that.

Can "second CPU" can be the Floating Point Unit ?

Olivier

UweBonnes commented 3 years ago

No, WLEx is the same die as WLx, only crippled in some way. You see that bye the same MCUID. So on STM32WLEx the transistors are there and is is seen in the detection. However on AP1 the Jump from Read AP 0x01 BASE : 0xf0000003 to "stlink_readmem from f0000ff0" differs from rm0453.pdf Figure 392 from 0xf00000000 to read 0xf00ff000. Probably a read to 0xf00ff000 would succeed. But we need to find out the ADIv5 algo to justify that jump.

ofauchon commented 3 years ago

Ok, I understand both chipsets are very close, but It seems the coresight topology is different in rm0461.pdf (STM32WLEx):

image

Sorry if this remark is not relevant ...

UweBonnes commented 3 years ago

You show the Topology if the first AP. Out problem is the second AP. And reading at 0xff00 means reading CIDR0. This CIDR0 read fails for AP1 on the cripple WL chip.The relevant code is:

uint32_t cidr = adiv5_ap_read_id(ap, addr + CIDR0_OFFSET); if ((cidr & ~CID_CLASS_MASK) != CID_PREAMBLE) { /* Maybe caused by a not halted CortexM */ if ((ap->idr & 0xf) == ARM_AP_TYPE_AHB) { if (!cortexm_prepare(ap)) return; /* Halting failed! */ /* CPU now halted, read cidr again. */ cidr = adiv5_ap_read_id(ap, addr + CIDR0_OFFSET); } } Obviously we should do another test and return if CIDR is still not valid. Please try ` diff --git a/src/target/adiv5.c b/src/target/adiv5.c index 58d0bc0a..72434941 100644 --- a/src/target/adiv5.c +++ b/src/target/adiv5.c @@ -416,6 +416,8 @@ static void adiv5_component_probe(ADIv5_AP_t ap, uint32_t addr, int recursion, return; / Halting failed! / / CPU now halted, read cidr again. */ cidr = adiv5_ap_read_id(ap, addr + CIDR0_OFFSET); ->+ if ((cidr & ~CID_CLASS_MASK) != CID_PREAMBLE) ->+ return; } }

if defined(ENABLE_DEBUG)

`

UweBonnes commented 3 years ago

Sorry, "insert code" does not work as expected.

ofauchon commented 3 years ago

Thanks Uwe, I'll test your patch and give you feedback.

If you're interested, I just made a capture of SWD communication while uploading a small .hex file to STM32WL with openocd (The .zip file contains signal capture, and cvs with SWD decoded protocol)

swd_capture_openocd_flash.zip

ofauchon commented 3 years ago

After patching adiv5_dp_init() to have only AP0 probed (and disable AP1 probing):

+       //for (int i = 0; (i < 256) && (void_aps < 8); i++)
+       for (int i = 0; (i < 1) && (void_aps < 8); i++)

I could run hosted BMP, and have something nicer:

$ ./src/blackmagic -t 2>&1
INFO: Open USB 0489:e0a2 class e0 failed
BMP hosted v1.7.1-123-g3b8502c-dirty
 for ST-Link V2/3, CMSIS_DAP, JLINK and LIBFTDI/MPSSE
Using 1d50:6018 5398F4B4 Black Sphere Technologies
 Black Magic Probe (SWLINK) v1.7.1-123-g3b8502c
Running in Test Mode
Target voltage:  Volt
Speed set to  3.2727 MHz for SWD
OFA: adiv5_swdp_scanOFA : adiv5_dp_init: start
DPIDR 0x6ba02477 (v2 rev6)
RESET_SEQ succeeded.
TARGETID 04970041
OFA : ADD new AP 0
OFA : adiv5_new_ap 
AP   0: IDR=24770011 CFG=00000000 BASE=e00ff003 CSW=23000040
AP#0 IDR = 0x24770011 (AHB-AP var1 rev2)
OFA : adiv5_component_probe
ROM: Table BASE=0xe00ff000 SYSMEM=0x00000001, designer  20 Partno 497
OFA : adiv5_component_probe
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x04000bb000  DEVTYPE = 0x00 ARCHID = 0x0000)-> cortexm_probe
CPUID 0x410fc241 (M4 var 0 rev 1)
OFA : adiv5_component_probe
Halt via DHCSR: success 01030003 after 1ms
1 0xe0001000: Generic IP component - Cortex-M3 DWT (Data Watchpoint and Trace) (PIDR = 0x04003bb002  DEVTYPE = 0x00 ARCHID = 0x0000)
OFA : adiv5_component_probe
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x04002bb003  DEVTYPE = 0x00 ARCHID = 0x0000)
OFA : adiv5_component_probe
3 0xe0000000: Generic IP component - Cortex-M3 ITM (Instrumentation Trace Module) (PIDR = 0x04003bb001  DEVTYPE = 0x00 ARCHID = 0x0000)
OFA : adiv5_component_probe
4 0xe0040000: Debug component - Cortex-M4 TPIU (Trace Port Interface Unit) (PIDR = 0x04000bb9a1  DEVTYPE = 0x11 ARCHID = 0x0000)
5 Entry 0xfff42002 -> Not present
OFA : adiv5_component_probe
6 0xe0043000: Debug component - CoreSight CTI (Cross Trigger) (PIDR = 0x04005bb906  DEVTYPE = 0x14 ARCHID = 0x0000)
ROM: Table END
*** 1 Unknown ARM Cortex-M Designer  20 Partno 497 M4

Now I'm trying to figure out what *** 1 Unknown ARM Cortex-M Designer 20 Partno 497 M4 means.

UweBonnes commented 3 years ago

20 is the ID of STM and 497 the MCUID of the STM32WL(e). The target/stm32xY,c files test for that number and set up things appropriate, if somebody addes support for that device.

UweBonnes commented 3 years ago

834 aborts scanning on the voided second CPU on AP1 in the crippled STM32WLE

AlexKlimaj commented 3 years ago

I am still seeing issues with the STM32WLE5JC on a LoRa-E5 module. Getting the following different errors.

(gdb) monitor swdp_scan
Target voltage: 3.3V
Timeout during scan. Is target stuck in WFI?
SW-DP scan failed!
(gdb) monitor swdp_scan
Target voltage: 3.3V
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
UweBonnes commented 3 years ago

Abort that early means fundamental problems, like wrong IO levels, native BMP w/o IO voltage, remaped SWD pins, etc. Hosted is more verbose and may also help.

AlexKlimaj commented 3 years ago

I have no problem connecting to it with an STLink. I am using a native blackmagic probe.

UweBonnes commented 3 years ago

mon tpwr ena?

AlexKlimaj commented 3 years ago
(gdb) target extended-remote /dev/ttyBmpGdb
Remote debugging using /dev/ttyBmpGdb
(gdb) 
(gdb) mon tpwr ena
Enabling target power
(gdb) monitor swdp_scan
Target voltage: 3.3V
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
UweBonnes commented 3 years ago

Unpower the BMP. An use hosted for more verbosity. And perhaps use discord chatter...

AlexKlimaj commented 3 years ago
Listening on TCP: 2000

Exception: SWDP invalid ACK
Trying old JTAG to SWD sequence
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff8
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ffc
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01020ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01020ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01020ff8
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01020ffc
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01030ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01030ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01030ff8
Timeout on read RESP
remote_ap_mem_read error -4 around 0x01030ffc
Timeout on read RESP
Timeout on read RESP
AlexKlimaj commented 3 years ago
alex@alex-ThinkCentre-M75q-Gen-2:~/Documents/blackmagic/src$ ./blackmagic -t 2>&1
INFO: Open USB 8087:0029 class e0 failed
BMP hosted v1.7.1-228-ga0dbb2a
 for ST-Link V2/3, CMSIS_DAP, JLINK and LIBFTDI/MPSSE
Using 1d50:6018 BFC795DB Black Sphere Technologies
 Black Magic Probe  v1.7.1-228-ga0dbb2a
Running in Test Mode
Target voltage: 3.3V Volt
Speed set to  3.2727 MHz for SWD
Exception: SWDP invalid ACK
Trying old JTAG to SWD sequence
TARGETID 04970041
DPIDR 0x6ba02477 (v2 rev6)
RESET_SEQ succeeded.
AP   0: IDR=24770011 CFG=00000000 BASE=e00ff003 CSW=23000040 (AHB-AP var1 rev2
Halt via DHCSR: success 00030003 after 1ms
ROM: Table BASE=0xe00ff000 SYSMEM=0x00000001, designer  20 Partno 497
0 0xe000e000: Generic IP component - Cortex-M3 SCS (System Control Space) (PIDR = 0x04000bb000  DEVTYPE = 0x00 ARCHID = 0x0000)-> cortexm_probe
CPUID 0x410fc241 (M4 var 0 rev 1)
Read e0042000: 497
1 0xe0001000: Generic IP component - Cortex-M3 DWT (Data Watchpoint and Trace) (PIDR = 0x04003bb002  DEVTYPE = 0x00 ARCHID = 0x0000)
2 0xe0002000: Generic IP component - Cortex-M3 FBP (Flash Patch and Breakpoint) (PIDR = 0x04002bb003  DEVTYPE = 0x00 ARCHID = 0x0000)
3 0xe0000000: Generic IP component - Cortex-M3 ITM (Instrumentation Trace Module) (PIDR = 0x04003bb001  DEVTYPE = 0x00 ARCHID = 0x0000)
4 0xe0040000: Debug component - Cortex-M4 TPIU (Trace Port Interface Unit) (PIDR = 0x04000bb9a1  DEVTYPE = 0x11 ARCHID = 0x0000)
5 Entry 0xfff42002 -> Not present
6 0xe0043000: Debug component - CoreSight CTI (Cross Trigger) (PIDR = 0x04005bb906  DEVTYPE = 0x14 ARCHID = 0x0000)
ROM: Table END
AP   1: IDR=84770001 CFG=00000000 BASE=f0000003 CSW=43800040 (AHB-AP var0 rev8
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff8
UweBonnes commented 3 years ago

Please try with the gdb firmware server. Exception handleing there is better as with hosted.

UweBonnes commented 3 years ago

Otherwise this problem is hard to handle w/o a uC to test. An I don't see a chance to but this part, due toi allocation.

ofauchon commented 3 years ago

In a desesperate effort to make this work, I added an exception in adiv5_component_probe (adiv5.c). image

(Sorry for the screenshot, I could not use diff it properly right now ... )

Seems this "exception" avoids some adiv5 unhandled cases .

@AlexKlimaj => Could you test this workaround ? @UweBonnes => If that fixes the problem, I'll add more debugs to have a better understanding of the problem

Original code:

AP   1: IDR=84770001 CFG=00000000 BASE=f0000003 CSW=43800040 (AHB-AP var0 rev8
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff8
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ffc

With the patch :

AP   1: IDR=84770001 CFG=00000000 BASE=f0000003 CSW=43800040 (AHB-AP var0 rev8
***  1      STM32WLxx M4
RAM   Start: 0x10000000 length = 0x8000
RAM   Start: 0x20000000 length = 0x18000
Flash Start: 0x08000000 length = 0x40000 blocksize 0x800

I made a few test with BMP on bluepill to debug STM32WL Lora E5. Seems to work !

AlexKlimaj commented 3 years ago

That worked!

(gdb) target extended-remote /dev/ttyACM0
Remote debugging using /dev/ttyACM0
(gdb) mon s
Target voltage: 3.3V
Available Targets:
No. Att Driver
 1      STM32WLxx M4
(gdb) attach 1
Attaching to Remote target
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0800078a in ?? ()
(gdb) 
ofauchon commented 3 years ago

That's great news ! Could you try to upload .bin or .elf files, and try to play with debugger ?

On my side, I'll go further in debugging blackmagic code so we can open a clean Pull Request.

Olivier

AlexKlimaj commented 3 years ago

Yes I was able to upload .elf files and set breakpoints.

ofauchon commented 3 years ago

STM32WL series have two variants:

image

image

The problem is that the STMWLE5x (one core) still exposes some valid AP Base address for the second AP (which should not be used in a single core device).

Although I've not finished reading the ADIv5 specification, we can imagine we should reinforce controls of AP validation somewhere in adiv5_new_ap() :

https://github.com/blacksphere/blackmagic/blob/a0dbb2a787fd29efcf160e8423ead71723a3ac33/src/target/adiv5.c#L590-L604

But It's still unclear how we could detect a invalid AP (Should we try to detect a timeout on first adiv5_mem_read32 in adiv5_ap_read_id() ? )

By the way, the pervious "quick and dirty" patch (return if addr=0xF0000000) solves the problem as it disables the whole scan of this second AP...

@UweBonnes: Any idea how would you fix that ?

Thanks.

Olivier

UweBonnes commented 3 years ago

I also think that both IDR and BASE read fine for the second, disabled AP. But adiv5_ap_read_pidr() fails. adiv5_ap_read_pidr should get an exception clause to catch that error. Probably some recovery must be done after that invalid access. Improper exception handling of hosted may be the next obstacle.

A proper solution would be fine, but perhaps some monitor option like "probe only first AP" could be implemented as emergency workaround.

ofauchon commented 3 years ago

I like "probe only first AP" monitor option, not that bad. I'll can work on this feature (unless you prefer do it yourself)

Thanks for your help .

Olivier

UweBonnes commented 3 years ago

Your welcome. Exposing that option for hosted needs a command line option. -M "new_option ena" will not work, it is applied to late.

ofauchon commented 3 years ago

Hi Uwe.

As I was digging into the code, I found something interesting:

adiv5_component_probe() has some timeout handling code when trying to read AP's ID. This should stop the component_probe with return:

https://github.com/blacksphere/blackmagic/blob/a0dbb2a787fd29efcf160e8423ead71723a3ac33/src/target/adiv5.c#L410-L425

But this is not the observed behavior while testing .

So I added some debugs to see what happens in this try_catch:

image

Debug traces:

OFA ***BEFORE TRYCATCH ***
OFA ***BEFORE ap_read_id call ***
OFA READ ID 
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff0
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff4
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ff8
Timeout on read RESP
remote_ap_mem_read error -4 around 0xf0000ffc
OFA END READ ID 
OFA ***AFTER ap_read_id call ***
OFA ***AFTER TRYCATCH ***

My first conclusion is that the "Timeout on read" is detected somewhere else (serial_unix.c), and the TRYCATCH in adiv5_component_probe is not working properly.

Do you have some idea on what's going on ?

Thx

Olivier

UweBonnes commented 3 years ago

As I told before, exception handling in hosted needs improvement.

UweBonnes commented 3 years ago

Can people try out #898? Thanks

UweBonnes commented 2 years ago

I finally got the Nucleo-WL55 at embedded world 22 and must state the error is still present. The second AP registers read fine, but the read to the second ROM table fails. I checked that no protection for debug of CPU2 is enabled. After the errors on the second AP, the DP is upset when more APs are checked, invalid results lead to more errors and finally hangup of the BMD.

UweBonnes commented 2 years ago

Peeking at the openocd configuration code, I see that PWR_CR4_C2BOOT needs to be set. See #1067