Closed ptillemans closed 1 year ago
Couple of questions to help us determine if this is a regression, etc:
1) for Bluepill, you should be building with make PROBE_HOST=stlink BLUEPILL=1
as this then gives the correct pinouts and such. Could you confirm that firmware built this way still hangs
2) Removing a couple of targets you're not currently using from src/Makefile (delete a few make PROBE_HOST=stlink BLUEPILL=1 ENABLE_DEBUG=1
and re-run your test adding mon debug en
before your SWD scan, and monitoring the other serial port (/dev/ttyACM1 in your setup, probably), as this will give us a debug log of what's going on.
If you could make the debug log available here by Gist or similar, or attaching a file with the output, that will allow us to dive in and see what could be causing this. We have RP2040's locally to test against and try to reproduce this.
Edit: Also, a small point for clarification, when you say "master branch", do you mean main
? Asking just so we're absolutely clear on what's being referred to by that
I have been trying to remove targets but I keep getting linker errors. So I cannot yet enable debugging.
I am using this script to build:
#!/bin/sh
make -j9 clean
make PROBE_HOST=swlink BLUEPILL=1 -j9
dfu-util -v -R -d 0483:df11 -s 0x08002000 -D src/blackmagic.bin
and I have the following in .gdbinit for the pico-examples blinky example
target ext /dev/ttyACM0
mon s
attach 1
load
my tests so far:
mon s
I am sorry, it seems the issue has magically disappeared. Something in my setup, PC (or most likely between chair and computer) makes the results flaky. (I did reproduce it earlier today, then scripted everything to remove operator variance and now it does not anymore)
Yesterday I enabled RTT with ENABLE_RTT=1
In that case I get the observed behavior. I probably got confused with inconsistent settings for ENABLE_RTT.
if I change the build.sh
to
#!/bin/sh
make -j9 clean
make PROBE_HOST=swlink BLUEPILL=1 ENABLE_RTT=1 -j9
dfu-util -v -R -d 0483:df11 -s 0x08002000 -D src/blackmagic.bin
git checkout .
results in
mon s
which is what I originally observed.
v1.9.0-rc1 failing on mon s
is a known issue that we've fixed in #1360 so that's expected. v1.8.2 predates RTT support as you observed.
Thank you for providing the extra information, we might now be able to figure out why RTT has broken RP2040 support when enabled. We'd been wondering why we couldn't reproduce it (with RTT off as we don't typically enable it), so that limits it down to an interaction between the two.
So, we've been trying to reproduce this with the information provided and.. funny problem - it works flawlessly on native from what we can tell. Transfer speed seems a little down over no-RTT builds, but all works exactly how we'd expect both with RTT off and RTT on but built into the firmware in both cases.
For now we'll bump this issue to v1.10 as reproducing this might take a lot more work.
I am waiting for the arrival of the BMP official hardware so I have something to compare against as I have gotten intrigued by this issue. If I could get the bluepill to work I would have a nice solution to just solder the programmer in place in my project rather than fiddle with custom cables et al. As they are cheap as chips I can just leave them in the project box while it is active which saves me ton of setup time in my workspace.
But it will be nice to have a benchmark official hardware part to compare to, because it is hard to get confidence in these cheap parts, my soldering skills, my PC setup, cabling, software versions... There are so many variables.
In any case awesome work it really works well in my experiments so far.
On Mon, Jan 23, 2023 at 6:46 AM Rachel Mant @.***> wrote:
So, we've been trying to reproduce this with the information provided and.. funny problem - it works flawlessly on native from what we can tell. Transfer speed seems a little down over no-RTT builds, but all works exactly how we'd expect both with RTT off and RTT on but built into the firmware in both cases.
For now we'll bump this issue to v1.10 as reproducing this might take a lot more work.
— Reply to this email directly, view it on GitHub https://github.com/blackmagic-debug/blackmagic/issues/1364#issuecomment-1399835181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK2MOF2ADSEBA6HUCNNGLWTYLK7ANCNFSM6AAAAAAUCVTSM4 . You are receiving this because you authored the thread.Message ID: @.***>
I can also confirm, flashing pi pico hangs after load command. Sometimes I get "Error erasing flash with vFlashErase packet", but most of the time it just hangs.
Compile options, MINGW64 on Windows:
mingw32-make PROBE_HOST=stlink ENABLE_RTT=1
Upload script:
target extended-remote $serial_interface
file $target_elf_path
mon swdp_scan
att 1
load
kill
q
p.s. I ordered BMP23 Native Hardware, I'll write the results later.
Thank you for partially bisecting when this was introduced! We haven't yet been able to repro this but we can look at the history on rp.c and see what changes impacting the Flash routines were made between 37efd257 and v1.9 and see if there's anything obvious with that information.
Thanks for the great open source project :) 37efd257 is the most stable commit in my use case with RP2040 and RTT feature.
Triggered by https://github.com/earlephilhower/arduino-pico/issues/1364 I flashed the given binary in there, that identifies only as Black Magic Probe (ST-Link/v2) v1.9.0-dirty, Hardware Version 0
, on my bluepill. Instead of experiencing a hang every time it flashes correctly sometimes, after which it most cases hangs up then. The reset button of the bluepill then has to be pressed to allow a new try. The used GDB version is 8.2.50.20190202-git
from pico-quick-toolchain.
Using batch script
C:\Users\Max\.platformio\packages\toolchain-rp2040-earlephilhower\bin\arm-none-eabi-gdb -nx --batch -ex "target extended-remote \\.\COM15" -ex "monitor swdp_scan" -ex "attach 1" -ex load -ex compare-sections -ex "kill" pico_blinky.elf
with pico_blinky.zip.
I'll try with the very latest commit.
Oh that's awesome. I really just had to download the CI binaries for the latest commit e490465713d67543ab6cf85919de81b24a90e5f6 at https://github.com/blackmagic-debug/blackmagic/actions/runs/4650453107 and reflash the blackmagic-stlink.bin
via dfu-util and now it works very reliably without a single hangup, tested 15 times.
Edit: Meh, sadly it's still a bit of a hit and miss with that commit. Suddenly after doing some more runs and resets the failure rate is like 33% - 50%. If it errors, its either in the flash write or erase for the .ota section in my firmware.
No. Att Driver
1 Raspberry RP2040 M0+
2 Raspberry RP2040 M0+
3 Raspberry RP2040 Rescue (Attach to reset!)
sleep_until (t=<optimized out>) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c:397
397 /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c: No such file or directory.
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Error writing data to flash
Section .boot2, range 0x10000000 -- 0x10000100: matched.
Section .ota, range 0x10000100 -- 0x100026e8: MIS-MATCHED!
Though the next run goes through without any problems then.
Available Targets:
No. Att Driver
1 Raspberry RP2040 M0+
2 Raspberry RP2040 M0+
3 Raspberry RP2040 Rescue (Attach to reset!)
0xfffffffe in ?? ()
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Loading section .partition, size 0x918 lma 0x100026e8
Loading section .text, size 0xc860 lma 0x10003000
Loading section .rodata, size 0x9e4 lma 0x1000f860
Loading section .data, size 0x11a4 lma 0x10010244
Start address 0x100030d4, load size 70632
Transfer rate: 53 KB/sec, 917 bytes/write.
It may have something to do with the execution being in nowhere (0xffffffe, an ISR?) for the next flash run? If the chip is in 0x00001bd0 in ?? ()
or in the above address, flashing always works. Otherwise not.
0xfffffffe
is a trap address the core goes to when it tries executing something invalid and the hard fault handler is not valid (typically 0xffffffff
, which is the same address just with the Thumb bit set)
Thank you for further testing this as we've not been able to reproduce this issue yet but you've given us a template for how we might go about doing so
One last addendum: In my script it's critical whether -ex detach
is added or not before -ex kill
. If I detach from the target after a successfull flash, it seems to keep it from resetting / running the firmware and is infinitely reflashable. Without it, the board executes the firmware (blinks on-board LED) and the next flash fails. So it's a perfect fail -> okay -> fail -> okay sequence, and once the chip has crashed in either 0xfffffffe
or 0x00001bd0
, it becomes flashable. Maybe because nothing else is accessing the flash anymore?
So it seems to me that the flashing problem has something to do with one or both Cortex-M0+ not being halted before the flash or the firmware execution bringing it into a weird state.. But that's just my two cents.
Having spent some time thinking on how to fix this and given the behaviour described on main, we've decided that the best course of action is to replace the ROM calls that were being made with direct SPI Flash access code - see the branch fix/rp2040-flash-reliability
.
This new code does need thoroughly testing but should be functional at this point while also reducing Flash usage of the firmware. Please give it a go and let us know how you get on. It likely needs work to make it run faster, and can probably benefit from some refactoring into the Flash mode entry/exit routines to further offload and simplify things, but we'd like to know it works reliably first
I tried branch fix/rp2040-flash-reliability
, but it fails (at least for me) to enumerate targets.
Maybe I did the wrong compilation by make PROBE_HOST=swlink BLUEPILL=1
?
Did the same wireup work for you before? My Bluepill <--> Pico connections are, 5V <-> VBUS, GND <->GND, PA5 <-> SWCLK, PB14 <-> SWDIO.
I tried branch
fix/rp2040-flash-reliability
, but it fails (at least for me) to enumerate targets.
Please provide an example of the output you're getting
Maybe I did the wrong compilation by
make PROBE_HOST=swlink BLUEPILL=1
?
The stlink
platform has BLUEPILL=1
, swlink
does not.
Fixed an oops in the fix/rp2040-flash-reliability branch that should improve things (forgot to update a check which meant the code would always fail rp_read_rom_func_table()
)
Sorry for the confusion; maybe I interchanged my files because I compile with Linux (WSL Ubuntu under Windows).
like this (according to my bash history):
make PROBE_HOST=stlink
without the BLUPILL=1 option
My pinout (in regards to SWCLK and SWDIO) I took from here
That works great for me. But my knowledge may be outdated.
To make my confusion complete there are hints for BLUEPILL in stlink and swlink as well:
Please help me on the right horse back: which version should I compile to use with the bluepill (and the above fixing branch) and where can I find the 'then valid' correct pinout?
Thanks @dragonmux for the new commits, I compiled the firmware with make PROBE_HOST=stlink BLUEPILL=1
and updated using dfu-util
. Commit is a55e8a54cb260b35beae7fd049a227f87a09a761.
Unfortunately it's still not working. In the first flash, when it works, flashing is slower than in the previous BMP version, by at least factor two (Or I compiled the firmware wrongly without optimization?). In the next run then, the transfer seems to go through almost instantly but then fails the section compares, so it doesn't look like it did any flashing at all.
sleep_until (t=<optimized out>) at /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c:397
397 /home/earle/Arduino/hardware/pico/rp2040/pico-sdk/src/common/pico_time/time.c: No such file or directory.
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .ota, size 0x25e8 lma 0x10000100
Loading section .partition, size 0x918 lma 0x100026e8
Loading section .text, size 0xc860 lma 0x10003000
Loading section .rodata, size 0x9e4 lma 0x1000f860
Loading section .data, size 0x11a4 lma 0x10010244
Start address 0x100030d4, load size 70632
Transfer rate: 621 KB/sec, 941 bytes/write.
Section .boot2, range 0x10000000 -- 0x10000100: MIS-MATCHED!
Section .ota, range 0x10000100 -- 0x100026e8: MIS-MATCHED!
Section .partition, range 0x100026e8 -- 0x10003000: MIS-MATCHED!
Section .text, range 0x10003000 -- 0x1000f860: MIS-MATCHED!
Section .rodata, range 0x1000f860 -- 0x10010244: MIS-MATCHED!
Section .data, range 0x10010244 -- 0x100113e8: MIS-MATCHED!
warning: One or more sections of the target image does not match
the loaded file
For the third run then, GDB hangs up, likely because the BMP does not respond anymore. I then have to press the reset button to unbrick it. Also the Bluepill's PC13 LED is lit up then at this stage and won't turn off again.
Right, good to know - we'll put some more work into the branch then as so far we've only done dry-runs of the code changes and not tested on real hardware (needed to complete the main rewrite before we could test it).
Regarding it being slower - that's actually expected because BMD driving the SPI peripheral will always be a bit slower than using the ROM routines - what it's bought us is that we don't now rely on any part of the state of the core which is why the motivation for the change
@Riffer the stlink platform hasn't changed in regards to this beyond the addition of BLUEPILL=1
to adjust for some pinout differences vs ST-Link v2's - use make PROBE_HOST=stlink BLUEPILL=1
(sorry for only just answering the question)
We've done some more work on the branch and have now tested it - while we are able to recreate the kill
after load crashes the target, we are able to always Flash the device now regardless of this state, and while it's slower at ~1kiB/s, it has been reliable for us and this includes running after load and breaking back into execution with Ctrl + C
Please let us know with the current version of fix/rp2040-flash-reliability
(4b9855421) how you fare.
We will look into the kill
crash issue separately as we suspect that's not unique to RP2040, but instead is a problem with the implementation for Cortex-M cores generally.
Behavior with the latest commit 1d001bc2258b4b2780d316f8755e32d986eaf34e:
First flash works -> second flash immediate fail with all section mismatch, same as above -> third flash hangup after displaying Loading section .boot2
, hard reset needed.
Still testing with the flashing batch script and elf file per above.
How curious.. we've been unable to reproduce that on our setup with either ORBTrace (a CMSIS-DAP adaptor) or BMP (native hardware) - even after inducing the kill crash, always been able to immediately reattach and Flash again and it just worked (we tried for a solid half hour to make it crash like described)
You're shown as using the ST-Link v2 firmware binary above, could you confirm which kind/generation of ST-Link (real, clone; v1, v2, v2.1) in case this is an issue in the platform causing additional grief
OOps, I looked at may Bluepill closer and I thought it was the one that had the genuine STM32F103C8/B chip on it, but instead I grabbed my Bluepill that had a GigaDevice GD32F303CC on it. Though it's said that it's "by pure chance binarry-compatible with STM32", let me grab my other Bluepill and reflash..
The F303 is a bit different to the F103, so things like timings in the bitbanging routines could be off.. potentially.. (more research required). It also has some Flash bugs, but those shouldn't be a problem here. As you're using bluepill boards, you'll need a build done as make PROBE_HOST=stlink BLUEPILL=1
- give us a moment and we can push one up in an archive for you
I switched to a genuine STM32F103C8 Bluepill but the issue is exactly the same. And while I did also compile it with make PROBE_HOST=stlink BLUEPILL=1
i'll flash your binary now.
Very interesting, your binary indeed works. No more issues.
I compiled the binary with Ubuntu 22 on
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:10.3-2021.07-4) 10.3.1 20210621 (release)
see binary attached: blackmagic.zip
Let me do a regression test with the previous firmware to see if it comes back.
As that's not the official ARM GCC, it would be interesting to see if you still get the issue if using either the old GNU-RM Arm toolchain or the current Arm GNU toolchain - if you do, then this is an issue with how Ubuntu's GCC was compiled (and it wouldn't be the first time!)
I did install the ARM toolchain via a simple sudo apt install gcc-arm-none-eabi
, so it's from the official repos :(
sudo apt list grep gcc-arm-none-eabi
Auflistung… Fertig
gcc-arm-none-eabi/kinetic,now 15:10.3-2021.07-4 amd64 [installiert]
And yes going back to my self-compiled binary the problem immediately returns with Okay -> Fail -> Hangup. So it's a problem on my side.
I also noticed that they show up as slightly different device names in the Windows device manager:
The -dirty
is expected as our working copy isn't entirely clean - we have outstanding modifications to .clang-tidy
, .gitignore
, and a couple of BMDA's files which are causing that - nothing that changes the built firmware binary meaningfully.
I sudo apt remove gcc-arm-none-eabi
and installed arm-gnu-toolchain-12.2.MPACBTI-Rel1-x86_64-arm-none-eabi.tar.xz
max@virtualbox:~/temp/blackmagic$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Rel1 (Build arm-12-mpacbti.34)) 12.2.1 20230214
Did a make clean
, recompiled, reflashed.
And now it's working too. Tested 6 runs, the old firmware would have failed long before that. So it was definitely the compiler.
Thanks for all the assist!
That's excellent news, thank you - had been going a bit well.. mad.. here, trying to reproduce what was wrong. Hoping this solves Riffer's issues too by giving them a binary to just download and use and likewise ptillemans is able to get back to us conforming their original issue is solved.
If you could make a new issue, maxgerhadt, with that kill
behaviour, we'll look into that separately and try and figure out what's going wrong there.
I just flashed the compiled version from above.
After several attempts (I first thought it hung): Upload works, but it is ridiculous slow.
Transfer rate: 3 KB/sec, 963 bytes/write.
Here is the relevant part log from vscode/platformio:
CURRENT: upload_protocol = blackmagic
MethodWrapper(["upload"], [".pio\build\rpipico\firmware.elf"])
Using manually specified: \.\COM16
arm-none-eabi-gdb -nx --batch -ex "target extended-remote \.\COM16" -ex "monitor swdp_scan" -ex "attach 1" -ex load -ex compare-sections -ex kill .pio\build\rpipico\firmware.elf
C:\Users\kposa.platformio\packages\toolchain-rp2040-earlephilhower\bin\arm-none-eabi-gdb.exe: warning: Couldn't determine a path for the index cache directory.
Target voltage: 2.90V
Available Targets:
No. Att Driver
1 Raspberry RP2040 M0+
2 Raspberry RP2040 M0+
3 Raspberry RP2040 Rescue (Attach to reset!)
sleep_until (t=
3kiB/s is not bad honestly, considering the RP2040 only allows us to send a single byte to program to it at a time, as a 32-bit write, and then requires a 32-bit read-back of the same register. Slow but correct beats fast but wrong.
We can look at improving the performance with something like a Flash stub now we have something that works.
More often than I should I push upload just to check what I wrote, so the slow speed will be a pain for the moment, and I will be happy to test future versions.
Short addendum: Debugging works and even uploading again directly while debugging session worked, too.
I received native HW V2.3b with GD32F103C8T6 and did some tests. The rp2040-flash-reliability is already merged into main branch, so I tried the latest main: d0408af59711a9951d0d113f9c6adb3a892bac22
Arm GNU Toolchain tested:
arm-none-eabi-gcc (Arm GNU Toolchain 11.3.Rel1) 11.3.1 20220712
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.Rel1 (Build arm-12.24)) 12.2.1 20221205
Compile options:
// native
mingw32-make clean
mingw32-make PROBE_HOST=native ENABLE_RTT=1
// blue pill
mingw32-make clean
mingw32-make PROBE_HOST=stlink BLUEPILL=1 ENABLE_RTT=1
Results: | native HW V2.3b GD32F103C8T6 | blue pill STM32F103C8 | |
---|---|---|---|
v1.9.0-524-gd0408af5 + Arm GNU Toolchain 12.2.Rel1 | works, 4 KB/sec | hangs after load | |
v1.9.0-524-gd0408af5 + Arm GNU Toolchain 11.3.Rel1 | works, 4 KB/sec | hangs after load | |
v1.8.0-266-g37efd257 + Arm GNU Toolchain 11.3.Rel1 | works, 67 KB/sec | works, 54 KB/sec |
It takes 23 seconds to upload 107128 bytes (104 KB):
// v1.9.0-524-gd0408af5
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .text, size 0xc240 lma 0x10000100
Loading section .rodata, size 0xd418 lma 0x1000c340
Loading section .binary_info, size 0x1c lma 0x10019758
Loading section .data, size 0xb04 lma 0x10019774
Start address 0x100001e8, load size 107128
Transfer rate: 4 KB/sec, 948 bytes/write.
Previously, it was 3 seconds:
// v1.8.0-266-g37efd257
Loading section .boot2, size 0x100 lma 0x10000000
Loading section .text, size 0xc240 lma 0x10000100
Loading section .rodata, size 0xd418 lma 0x1000c340
Loading section .binary_info, size 0x1c lma 0x10019758
Loading section .data, size 0xb04 lma 0x10019774
Start address 0x100001e8, load size 107128
Transfer rate: 67 KB/sec, 948 bytes/write.
Conclusion:
It looks like the issue is not completely solved. How can I help make this rp2040-flash-reliability work on Blue pill STM32F103C8?
The first thing that comes to mind that would be helpful is if you can use v1.9.0 and check if it still worked on that version or not, and use only a single toolchain (12.2Rel1 would be perfect).
When you say "hangs after load", we assume load completes to spitting out the transfer rate stat? Is it definitely after load
or during?
Regarding the speed, that's entirely as expected as to create reliability we had to switch from using the boot ROM's Flash routines to driving the SPI controller directly from the firmware, which bottlenecks us on how fast we're able to do 32-bit ADIv5 memory reads and writes of the SPI peripheral data register which accepts just one byte at a time.
Given what we're doing here, we suspect your Bluepill's firmware is crashing on d0408af so a build of the firmware done with ENABLE_DEBUG=1
(you'll have to disable a few targets the test isn't using to make space for the build to fit Flash) may shed some light via the secondary USB serial port device. You'll need to mon debug en
prior to running load
to get debug output
Saying "hangs after load", I mean immediately after issuing load command in gdb. The load command does not complete to the end, so it looks like "during", or to be more precise at the start:
(gdb) load
Loading section .boot2, size 0x100 lma 0x10000000
// no more output here, it hangs until I disconnect usb cable.
Agree, reliability is more important. But hopefully speed improvements are technically possible to implement later.
Regarding the latest v1.9 3c8b2b7abd76578b4ad3588799fc664bdc42b2de : It doesn't work, no vital changes in the v1.9 branch since my previous test two weeks ago (I used 1fd6cb120c4ceb95e7b71c3e84595babd24c8615 at the time). I have few old good blue pill boards with genuine STM32F103C8 (at least I haven't been able to find out that they are fake yet).
// v1.9.0-8-g3c8b2b7a + Arm GNU Toolchain 12.2.Rel1 --> hangs
mingw32-make PROBE_HOST=stlink BLUEPILL=1 ENABLE_RTT=1
Thank you for the hint with ENABLE_DEBUG=1
, I'll try that and report results later.
Agree, reliability is more important. But hopefully speed improvements are technically possible to implement later.
Yeah, we can look at building a Flash stub, moving the work of running the SPI peripheral back onto the RP2040 in a more controlled way that makes the firmware much less at the mercy of the core state. It's still a step more complicated than having the firmware do the work purely via ADIv5 memory access, which is why we didn't start there.
When running the master branch when I load the program after attach to the first target (mon s shows 3 targets so basic comms works) it hangs on:
then it hangs....
When I revert to the version with tag v1.8.2 it succeeds just fine:
This is on a bluepill compiled with
(I ordered a real BMP to compare but in the meantime I can proceed with using the v1.8.2 version)