blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.
GNU General Public License v3.0
3.24k stars 770 forks source link

BMP hangs after load on raspberry pico #1364

Closed ptillemans closed 1 year ago

ptillemans commented 1 year ago

When running the master branch when I load the program after attach to the first target (mon s shows 3 targets so basic comms works) it hangs on:

Reading symbols from blink.elf...
(gdb) tar ext /dev/ttyACM0
Remote debugging using /dev/ttyACM0
(gdb) mon s
Available Targets:
No. Att Driver
 1      Raspberry RP2040 M0+
 2      Raspberry RP2040 M0+
 3      Raspberry RP2040 Rescue(Attach to reset!) 
(gdb) attach 1
Attaching to program: /home/pti/sandbox/pico-examples/build/blink/blink.elf, Remote target
0x10000c0a in sleep_until ()
(gdb) load
Loading section .boot2, size 0x100 lma 0x10000000

then it hangs....

When I revert to the version with tag v1.8.2 it succeeds just fine:

Reading symbols from blink.elf...
(gdb) tar ext /dev/ttyACM0
Remote debugging using /dev/ttyACM0
(gdb) mon s
Available Targets:
No. Att Driver
 1      Raspberry RP2040 M0+
 2      Raspberry RP2040 M0+
 3      Raspberry RP2040 Rescue(Attach to reset!) 
(gdb) attach 1
Attaching to program: /home/pti/sandbox/pico-examples/build/blink/blink.elf, Remote target
0x10000c0a in sleep_until ()
(gdb) load
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .text, size 0x1d80 lma 0x10000100
Loading section .rodata, size 0xf8 lma 0x10001e80
Loading section .binary_info, size 0x20 lma 0x10001f78
Loading section .data, size 0x59c lma 0x10001f98
Start address 0x100001e8, load size 9524
Transfer rate: 27 KB/sec, 732 bytes/write.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/pti/sandbox/pico-examples/build/blink/blink.elf 

This is on a bluepill compiled with

make PROBE_HOST=swlink

(I ordered a real BMP to compare but in the meantime I can proceed with using the v1.8.2 version)

dragonmux commented 1 year ago

Couple of questions to help us determine if this is a regression, etc:

1) for Bluepill, you should be building with make PROBE_HOST=stlink BLUEPILL=1 as this then gives the correct pinouts and such. Could you confirm that firmware built this way still hangs 2) Removing a couple of targets you're not currently using from src/Makefile (delete a few .c lines for processors you're not currently debugging such as the LPC suite), could you please make PROBE_HOST=stlink BLUEPILL=1 ENABLE_DEBUG=1 and re-run your test adding mon debug en before your SWD scan, and monitoring the other serial port (/dev/ttyACM1 in your setup, probably), as this will give us a debug log of what's going on.

If you could make the debug log available here by Gist or similar, or attaching a file with the output, that will allow us to dive in and see what could be causing this. We have RP2040's locally to test against and try to reproduce this.

Edit: Also, a small point for clarification, when you say "master branch", do you mean main? Asking just so we're absolutely clear on what's being referred to by that

ptillemans commented 1 year ago

I have been trying to remove targets but I keep getting linker errors. So I cannot yet enable debugging.

I am using this script to build:

#!/bin/sh
make -j9 clean
make PROBE_HOST=swlink BLUEPILL=1 -j9
dfu-util -v -R -d 0483:df11 -s 0x08002000 -D src/blackmagic.bin

and I have the following in .gdbinit for the pico-examples blinky example

target ext /dev/ttyACM0
mon s
attach 1
load

my tests so far:

I am sorry, it seems the issue has magically disappeared. Something in my setup, PC (or most likely between chair and computer) makes the results flaky. (I did reproduce it earlier today, then scripted everything to remove operator variance and now it does not anymore)

ptillemans commented 1 year ago

Yesterday I enabled RTT with ENABLE_RTT=1

In that case I get the observed behavior. I probably got confused with inconsistent settings for ENABLE_RTT.

if I change the build.sh to

#!/bin/sh
make -j9 clean
make PROBE_HOST=swlink BLUEPILL=1 ENABLE_RTT=1 -j9
dfu-util -v -R -d 0483:df11 -s 0x08002000 -D src/blackmagic.bin
git checkout .

results in

which is what I originally observed.

dragonmux commented 1 year ago

v1.9.0-rc1 failing on mon s is a known issue that we've fixed in #1360 so that's expected. v1.8.2 predates RTT support as you observed.

Thank you for providing the extra information, we might now be able to figure out why RTT has broken RP2040 support when enabled. We'd been wondering why we couldn't reproduce it (with RTT off as we don't typically enable it), so that limits it down to an interaction between the two.

dragonmux commented 1 year ago

So, we've been trying to reproduce this with the information provided and.. funny problem - it works flawlessly on native from what we can tell. Transfer speed seems a little down over no-RTT builds, but all works exactly how we'd expect both with RTT off and RTT on but built into the firmware in both cases.

For now we'll bump this issue to v1.10 as reproducing this might take a lot more work.

ptillemans commented 1 year ago

I am waiting for the arrival of the BMP official hardware so I have something to compare against as I have gotten intrigued by this issue. If I could get the bluepill to work I would have a nice solution to just solder the programmer in place in my project rather than fiddle with custom cables et al. As they are cheap as chips I can just leave them in the project box while it is active which saves me ton of setup time in my workspace.

But it will be nice to have a benchmark official hardware part to compare to, because it is hard to get confidence in these cheap parts, my soldering skills, my PC setup, cabling, software versions... There are so many variables.

In any case awesome work it really works well in my experiments so far.

On Mon, Jan 23, 2023 at 6:46 AM Rachel Mant @.***> wrote:

So, we've been trying to reproduce this with the information provided and.. funny problem - it works flawlessly on native from what we can tell. Transfer speed seems a little down over no-RTT builds, but all works exactly how we'd expect both with RTT off and RTT on but built into the firmware in both cases.

For now we'll bump this issue to v1.10 as reproducing this might take a lot more work.

— Reply to this email directly, view it on GitHub https://github.com/blackmagic-debug/blackmagic/issues/1364#issuecomment-1399835181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK2MOF2ADSEBA6HUCNNGLWTYLK7ANCNFSM6AAAAAAUCVTSM4 . You are receiving this because you authored the thread.Message ID: @.***>

gresolio commented 1 year ago

I can also confirm, flashing pi pico hangs after load command. Sometimes I get "Error erasing flash with vFlashErase packet", but most of the time it just hangs.

Compile options, MINGW64 on Windows:

mingw32-make PROBE_HOST=stlink ENABLE_RTT=1

Upload script:

target extended-remote $serial_interface
file $target_elf_path
mon swdp_scan
att 1
load
kill
q

p.s. I ordered BMP23 Native Hardware, I'll write the results later.

dragonmux commented 1 year ago

Thank you for partially bisecting when this was introduced! We haven't yet been able to repro this but we can look at the history on rp.c and see what changes impacting the Flash routines were made between 37efd257 and v1.9 and see if there's anything obvious with that information.

gresolio commented 1 year ago

Thanks for the great open source project :) 37efd257 is the most stable commit in my use case with RP2040 and RTT feature.

maxgerhardt commented 1 year ago

Triggered by https://github.com/earlephilhower/arduino-pico/issues/1364 I flashed the given binary in there, that identifies only as Black Magic Probe (ST-Link/v2) v1.9.0-dirty, Hardware Version 0, on my bluepill. Instead of experiencing a hang every time it flashes correctly sometimes, after which it most cases hangs up then. The reset button of the bluepill then has to be pressed to allow a new try. The used GDB version is 8.2.50.20190202-git from pico-quick-toolchain.

Using batch script

C:\Users\Max\.platformio\packages\toolchain-rp2040-earlephilhower\bin\arm-none-eabi-gdb -nx --batch -ex "target extended-remote \\.\COM15" -ex "monitor swdp_scan" -ex "attach 1" -ex load -ex compare-sections -ex "kill" pico_blinky.elf

with pico_blinky.zip.

I'll try with the very latest commit.

maxgerhardt commented 1 year ago

Oh that's awesome. I really just had to download the CI binaries for the latest commit e490465713d67543ab6cf85919de81b24a90e5f6 at https://github.com/blackmagic-debug/blackmagic/actions/runs/4650453107 and reflash the blackmagic-stlink.bin via dfu-util and now it works very reliably without a single hangup, tested 15 times.

Edit: Meh, sadly it's still a bit of a hit and miss with that commit. Suddenly after doing some more runs and resets the failure rate is like 33% - 50%. If it errors, its either in the flash write or erase for the .ota section in my firmware.

No. Att Driver
 1      Raspberry RP2040 M0+
 2      Raspberry RP2040 M0+
 3      Raspberry RP2040 Rescue (Attach to reset!)
sleep_until (t=<optimized out>) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c:397
397     /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c: No such file or directory.
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Error writing data to flash
Section .boot2, range 0x10000000 -- 0x10000100: matched.
Section .ota, range 0x10000100 -- 0x100026e8: MIS-MATCHED!

Though the next run goes through without any problems then.

Available Targets:
No. Att Driver
 1      Raspberry RP2040 M0+
 2      Raspberry RP2040 M0+
 3      Raspberry RP2040 Rescue (Attach to reset!)
0xfffffffe in ?? ()
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Loading section .partition, size 0x918 lma 0x100026e8
Loading section .text, size 0xc860 lma 0x10003000
Loading section .rodata, size 0x9e4 lma 0x1000f860
Loading section .data, size 0x11a4 lma 0x10010244
Start address 0x100030d4, load size 70632
Transfer rate: 53 KB/sec, 917 bytes/write.

It may have something to do with the execution being in nowhere (0xffffffe, an ISR?) for the next flash run? If the chip is in 0x00001bd0 in ?? () or in the above address, flashing always works. Otherwise not.

dragonmux commented 1 year ago

0xfffffffe is a trap address the core goes to when it tries executing something invalid and the hard fault handler is not valid (typically 0xffffffff, which is the same address just with the Thumb bit set)

Thank you for further testing this as we've not been able to reproduce this issue yet but you've given us a template for how we might go about doing so

maxgerhardt commented 1 year ago

One last addendum: In my script it's critical whether -ex detach is added or not before -ex kill. If I detach from the target after a successfull flash, it seems to keep it from resetting / running the firmware and is infinitely reflashable. Without it, the board executes the firmware (blinks on-board LED) and the next flash fails. So it's a perfect fail -> okay -> fail -> okay sequence, and once the chip has crashed in either 0xfffffffe or 0x00001bd0, it becomes flashable. Maybe because nothing else is accessing the flash anymore?

So it seems to me that the flashing problem has something to do with one or both Cortex-M0+ not being halted before the flash or the firmware execution bringing it into a weird state.. But that's just my two cents.

dragonmux commented 1 year ago

Having spent some time thinking on how to fix this and given the behaviour described on main, we've decided that the best course of action is to replace the ROM calls that were being made with direct SPI Flash access code - see the branch fix/rp2040-flash-reliability.

This new code does need thoroughly testing but should be functional at this point while also reducing Flash usage of the firmware. Please give it a go and let us know how you get on. It likely needs work to make it run faster, and can probably benefit from some refactoring into the Flash mode entry/exit routines to further offload and simplify things, but we'd like to know it works reliably first

Riffer commented 1 year ago

I tried branch fix/rp2040-flash-reliability, but it fails (at least for me) to enumerate targets.

Maybe I did the wrong compilation by make PROBE_HOST=swlink BLUEPILL=1?

maxgerhardt commented 1 year ago

Did the same wireup work for you before? My Bluepill <--> Pico connections are, 5V <-> VBUS, GND <->GND, PA5 <-> SWCLK, PB14 <-> SWDIO.

dragonmux commented 1 year ago

I tried branch fix/rp2040-flash-reliability, but it fails (at least for me) to enumerate targets.

Please provide an example of the output you're getting

Maybe I did the wrong compilation by make PROBE_HOST=swlink BLUEPILL=1?

The stlink platform has BLUEPILL=1, swlink does not.

dragonmux commented 1 year ago

Fixed an oops in the fix/rp2040-flash-reliability branch that should improve things (forgot to update a check which meant the code would always fail rp_read_rom_func_table())

Riffer commented 1 year ago

Sorry for the confusion; maybe I interchanged my files because I compile with Linux (WSL Ubuntu under Windows).

like this (according to my bash history): make PROBE_HOST=stlink without the BLUPILL=1 option

My pinout (in regards to SWCLK and SWDIO) I took from here image

That works great for me. But my knowledge may be outdated.

To make my confusion complete there are hints for BLUEPILL in stlink and swlink as well:

image

Please help me on the right horse back: which version should I compile to use with the bluepill (and the above fixing branch) and where can I find the 'then valid' correct pinout?

maxgerhardt commented 1 year ago

Thanks @dragonmux for the new commits, I compiled the firmware with make PROBE_HOST=stlink BLUEPILL=1 and updated using dfu-util. Commit is a55e8a54cb260b35beae7fd049a227f87a09a761.

Unfortunately it's still not working. In the first flash, when it works, flashing is slower than in the previous BMP version, by at least factor two (Or I compiled the firmware wrongly without optimization?). In the next run then, the transfer seems to go through almost instantly but then fails the section compares, so it doesn't look like it did any flashing at all.

sleep_until (t=<optimized out>) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c:397
397     /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c: No such file or directory.
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Loading section .partition, size 0x918 lma 0x100026e8
Loading section .text, size 0xc860 lma 0x10003000
Loading section .rodata, size 0x9e4 lma 0x1000f860
Loading section .data, size 0x11a4 lma 0x10010244
Start address 0x100030d4, load size 70632
Transfer rate: 621 KB/sec, 941 bytes/write.
Section .boot2, range 0x10000000 -- 0x10000100: MIS-MATCHED!
Section .ota, range 0x10000100 -- 0x100026e8: MIS-MATCHED!
Section .partition, range 0x100026e8 -- 0x10003000: MIS-MATCHED!
Section .text, range 0x10003000 -- 0x1000f860: MIS-MATCHED!
Section .rodata, range 0x1000f860 -- 0x10010244: MIS-MATCHED!
Section .data, range 0x10010244 -- 0x100113e8: MIS-MATCHED!
warning: One or more sections of the target image does not match
the loaded file

For the third run then, GDB hangs up, likely because the BMP does not respond anymore. I then have to press the reset button to unbrick it. Also the Bluepill's PC13 LED is lit up then at this stage and won't turn off again.

dragonmux commented 1 year ago

Right, good to know - we'll put some more work into the branch then as so far we've only done dry-runs of the code changes and not tested on real hardware (needed to complete the main rewrite before we could test it).

Regarding it being slower - that's actually expected because BMD driving the SPI peripheral will always be a bit slower than using the ROM routines - what it's bought us is that we don't now rely on any part of the state of the core which is why the motivation for the change

dragonmux commented 1 year ago

@Riffer the stlink platform hasn't changed in regards to this beyond the addition of BLUEPILL=1 to adjust for some pinout differences vs ST-Link v2's - use make PROBE_HOST=stlink BLUEPILL=1 (sorry for only just answering the question)

dragonmux commented 1 year ago

We've done some more work on the branch and have now tested it - while we are able to recreate the kill after load crashes the target, we are able to always Flash the device now regardless of this state, and while it's slower at ~1kiB/s, it has been reliable for us and this includes running after load and breaking back into execution with Ctrl + C

Please let us know with the current version of fix/rp2040-flash-reliability (4b9855421) how you fare.

We will look into the kill crash issue separately as we suspect that's not unique to RP2040, but instead is a problem with the implementation for Cortex-M cores generally.

maxgerhardt commented 1 year ago

Behavior with the latest commit 1d001bc2258b4b2780d316f8755e32d986eaf34e:

First flash works -> second flash immediate fail with all section mismatch, same as above -> third flash hangup after displaying Loading section .boot2, hard reset needed.

Still testing with the flashing batch script and elf file per above.

dragonmux commented 1 year ago

How curious.. we've been unable to reproduce that on our setup with either ORBTrace (a CMSIS-DAP adaptor) or BMP (native hardware) - even after inducing the kill crash, always been able to immediately reattach and Flash again and it just worked (we tried for a solid half hour to make it crash like described)

You're shown as using the ST-Link v2 firmware binary above, could you confirm which kind/generation of ST-Link (real, clone; v1, v2, v2.1) in case this is an issue in the platform causing additional grief

maxgerhardt commented 1 year ago

OOps, I looked at may Bluepill closer and I thought it was the one that had the genuine STM32F103C8/B chip on it, but instead I grabbed my Bluepill that had a GigaDevice GD32F303CC on it. Though it's said that it's "by pure chance binarry-compatible with STM32", let me grab my other Bluepill and reflash..

dragonmux commented 1 year ago

The F303 is a bit different to the F103, so things like timings in the bitbanging routines could be off.. potentially.. (more research required). It also has some Flash bugs, but those shouldn't be a problem here. As you're using bluepill boards, you'll need a build done as make PROBE_HOST=stlink BLUEPILL=1 - give us a moment and we can push one up in an archive for you

Edit: blackmagic-bluepill.tar.gz

maxgerhardt commented 1 year ago

I switched to a genuine STM32F103C8 Bluepill but the issue is exactly the same. And while I did also compile it with make PROBE_HOST=stlink BLUEPILL=1 i'll flash your binary now.

maxgerhardt commented 1 year ago

Very interesting, your binary indeed works. No more issues.

I compiled the binary with Ubuntu 22 on

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:10.3-2021.07-4) 10.3.1 20210621 (release)

see binary attached: blackmagic.zip

Let me do a regression test with the previous firmware to see if it comes back.

dragonmux commented 1 year ago

As that's not the official ARM GCC, it would be interesting to see if you still get the issue if using either the old GNU-RM Arm toolchain or the current Arm GNU toolchain - if you do, then this is an issue with how Ubuntu's GCC was compiled (and it wouldn't be the first time!)

maxgerhardt commented 1 year ago

I did install the ARM toolchain via a simple sudo apt install gcc-arm-none-eabi, so it's from the official repos :(

sudo apt list  grep gcc-arm-none-eabi 
Auflistung… Fertig
gcc-arm-none-eabi/kinetic,now 15:10.3-2021.07-4 amd64  [installiert]

And yes going back to my self-compiled binary the problem immediately returns with Okay -> Fail -> Hangup. So it's a problem on my side.

I also noticed that they show up as slightly different device names in the Windows device manager:

grafik

dragonmux commented 1 year ago

The -dirty is expected as our working copy isn't entirely clean - we have outstanding modifications to .clang-tidy, .gitignore, and a couple of BMDA's files which are causing that - nothing that changes the built firmware binary meaningfully.

maxgerhardt commented 1 year ago

I sudo apt remove gcc-arm-none-eabi and installed arm-gnu-toolchain-12.2.MPACBTI-Rel1-x86_64-arm-none-eabi.tar.xz

max@virtualbox:~/temp/blackmagic$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Rel1 (Build arm-12-mpacbti.34)) 12.2.1 20230214

Did a make clean, recompiled, reflashed.

And now it's working too. Tested 6 runs, the old firmware would have failed long before that. So it was definitely the compiler.

Thanks for all the assist!

dragonmux commented 1 year ago

That's excellent news, thank you - had been going a bit well.. mad.. here, trying to reproduce what was wrong. Hoping this solves Riffer's issues too by giving them a binary to just download and use and likewise ptillemans is able to get back to us conforming their original issue is solved.

If you could make a new issue, maxgerhadt, with that kill behaviour, we'll look into that separately and try and figure out what's going wrong there.

Riffer commented 1 year ago

I just flashed the compiled version from above.

image

After several attempts (I first thought it hung): Upload works, but it is ridiculous slow.

Transfer rate: 3 KB/sec, 963 bytes/write.

Here is the relevant part log from vscode/platformio: CURRENT: upload_protocol = blackmagic MethodWrapper(["upload"], [".pio\build\rpipico\firmware.elf"]) Using manually specified: \.\COM16 arm-none-eabi-gdb -nx --batch -ex "target extended-remote \.\COM16" -ex "monitor swdp_scan" -ex "attach 1" -ex load -ex compare-sections -ex kill .pio\build\rpipico\firmware.elf C:\Users\kposa.platformio\packages\toolchain-rp2040-earlephilhower\bin\arm-none-eabi-gdb.exe: warning: Couldn't determine a path for the index cache directory. Target voltage: 2.90V Available Targets: No. Att Driver 1 Raspberry RP2040 M0+ 2 Raspberry RP2040 M0+ 3 Raspberry RP2040 Rescue (Attach to reset!) sleep_until (t=) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c:397 397 /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c: No such file or directory. Loading section .boot2, size 0x100 lma 0x10000000 Loading section .ota, size 0x25e8 lma 0x10000100 Loading section .partition, size 0x918 lma 0x100026e8 Loading section .text, size 0x13ac0 lma 0x10003000 Loading section .rodata, size 0x39024 lma 0x10016ac0 Loading section .data, size 0x13a8 lma 0x1004fae4 Start address 0x100030d4, load size 331404 Transfer rate: 3 KB/sec, 963 bytes/write. Section .boot2, range 0x10000000 -- 0x10000100: matched. Section .ota, range 0x10000100 -- 0x100026e8: matched. Section .partition, range 0x100026e8 -- 0x10003000: matched. Section .text, range 0x10003000 -- 0x10016ac0: matched. Section .rodata, range 0x10016ac0 -- 0x1004fae4: matched. Section .data, range 0x1004fae4 -- 0x10050e8c: matched. Kill the program being debugged? (y or n) [answered Y; input not from terminal] [Inferior 1 (Remote target) killed]

dragonmux commented 1 year ago

3kiB/s is not bad honestly, considering the RP2040 only allows us to send a single byte to program to it at a time, as a 32-bit write, and then requires a 32-bit read-back of the same register. Slow but correct beats fast but wrong.

We can look at improving the performance with something like a Flash stub now we have something that works.

Riffer commented 1 year ago

More often than I should I push upload just to check what I wrote, so the slow speed will be a pain for the moment, and I will be happy to test future versions.

Short addendum: Debugging works and even uploading again directly while debugging session worked, too.

gresolio commented 1 year ago

I received native HW V2.3b with GD32F103C8T6 and did some tests. The rp2040-flash-reliability is already merged into main branch, so I tried the latest main: d0408af59711a9951d0d113f9c6adb3a892bac22

Arm GNU Toolchain tested:

arm-none-eabi-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.Rel1 (Build arm-12.24)) 12.2.1 20221205

Compile options:

// native
mingw32-make clean
mingw32-make PROBE_HOST=native ENABLE_RTT=1

// blue pill
mingw32-make clean
mingw32-make PROBE_HOST=stlink BLUEPILL=1 ENABLE_RTT=1
Results: native HW V2.3b GD32F103C8T6 blue pill STM32F103C8
v1.9.0-524-gd0408af5 + Arm GNU Toolchain 12.2.Rel1 works, 4 KB/sec hangs after load
v1.9.0-524-gd0408af5 + Arm GNU Toolchain 11.3.Rel1 works, 4 KB/sec hangs after load
v1.8.0-266-g37efd257 + Arm GNU Toolchain 11.3.Rel1 works, 67 KB/sec works, 54 KB/sec

It takes 23 seconds to upload 107128 bytes (104 KB):

// v1.9.0-524-gd0408af5
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .text, size 0xc240 lma 0x10000100
Loading section .rodata, size 0xd418 lma 0x1000c340
Loading section .binary_info, size 0x1c lma 0x10019758
Loading section .data, size 0xb04 lma 0x10019774
Start address 0x100001e8, load size 107128
Transfer rate: 4 KB/sec, 948 bytes/write.

Previously, it was 3 seconds:

// v1.8.0-266-g37efd257
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .text, size 0xc240 lma 0x10000100
Loading section .rodata, size 0xd418 lma 0x1000c340
Loading section .binary_info, size 0x1c lma 0x10019758
Loading section .data, size 0xb04 lma 0x10019774
Start address 0x100001e8, load size 107128
Transfer rate: 67 KB/sec, 948 bytes/write.

Conclusion:

  1. Blue pill STM32F103C8 build is broken, it still hangs after load.
  2. Native build is OK, but very slow.

It looks like the issue is not completely solved. How can I help make this rp2040-flash-reliability work on Blue pill STM32F103C8?

dragonmux commented 1 year ago

The first thing that comes to mind that would be helpful is if you can use v1.9.0 and check if it still worked on that version or not, and use only a single toolchain (12.2Rel1 would be perfect).

When you say "hangs after load", we assume load completes to spitting out the transfer rate stat? Is it definitely after load or during?

Regarding the speed, that's entirely as expected as to create reliability we had to switch from using the boot ROM's Flash routines to driving the SPI controller directly from the firmware, which bottlenecks us on how fast we're able to do 32-bit ADIv5 memory reads and writes of the SPI peripheral data register which accepts just one byte at a time.

Given what we're doing here, we suspect your Bluepill's firmware is crashing on d0408af so a build of the firmware done with ENABLE_DEBUG=1 (you'll have to disable a few targets the test isn't using to make space for the build to fit Flash) may shed some light via the secondary USB serial port device. You'll need to mon debug en prior to running load to get debug output

gresolio commented 1 year ago

Saying "hangs after load", I mean immediately after issuing load command in gdb. The load command does not complete to the end, so it looks like "during", or to be more precise at the start:

(gdb) load
Loading section .boot2, size 0x100 lma 0x10000000

// no more output here, it hangs until I disconnect usb cable.

Agree, reliability is more important. But hopefully speed improvements are technically possible to implement later.

Regarding the latest v1.9 3c8b2b7abd76578b4ad3588799fc664bdc42b2de : It doesn't work, no vital changes in the v1.9 branch since my previous test two weeks ago (I used 1fd6cb120c4ceb95e7b71c3e84595babd24c8615 at the time). I have few old good blue pill boards with genuine STM32F103C8 (at least I haven't been able to find out that they are fake yet).

// v1.9.0-8-g3c8b2b7a + Arm GNU Toolchain 12.2.Rel1 --> hangs
mingw32-make PROBE_HOST=stlink BLUEPILL=1 ENABLE_RTT=1

Thank you for the hint with ENABLE_DEBUG=1, I'll try that and report results later.

dragonmux commented 1 year ago

Agree, reliability is more important. But hopefully speed improvements are technically possible to implement later.

Yeah, we can look at building a Flash stub, moving the work of running the SPI peripheral back onto the RP2040 in a more controlled way that makes the firmware much less at the mercy of the core state. It's still a step more complicated than having the firmware do the work purely via ADIv5 memory access, which is why we didn't start there.