LPC17xx detection fails / unreliable

mrehkopf commented 1 year ago

Connecting a BMP to an LPC1754 or LPC1756 board via SWD or JTAG yields the following gdb output when probing:

(gdb) mon swd
Please report unknown device with Designer 0x8000 Part ID 0x0
Available Targets:
No. Att Driver
*** 1   Unknown ARM Cortex-M Designer 0x8000 Part ID 0x0 M3

As requested on Discord, I compiled the latest git state (302c8b5), then compiled a debug-enabled PC hosted blackmagic client using PROBE_HOST=hosted ENABLE_DEBUG=1 and ran it: blackmagic -tv 5. This is the log output: blackmagic-hosted-LPC1756.log

It might be worth mentioning that on any subsequent run the probe seems to get locked up spinning on a debug register: blackmagic-hosted-LPC1756-infinite.log

Aaand I last-minute found out that I can fix this by resetting the target after the lockup via the SRST pin. And to my surprise, after resetting it is detected properly: blackmagic-hosted-LPC1756-after-reset.log

dragonmux commented 1 year ago

Please give fix/lpc17xx-detection-regression a try - It probably won't fix the issue, but data gathered via BMDA using that branch should be easier to follow

mrehkopf commented 1 year ago

Thanks! Here are the logs. I did the same procedure as before; this time the target doesn't get detected after resetting it.

BMDA-lpc17xx-detection-regression.log BMDA-lpc17xx-detection-regression-lockup.log BMDA-lpc17xx-detection-regression-after-reset.log

dragonmux commented 1 year ago

Our bad - rather than removing an offset from the top of RAM for the IAP stack, we accidentally ended up with the stack placed before RAM.. so the routine being called wasn't even running as it was instantly crashing through bad stack access. The branch is updated now to fix that. If you could re-try, it would be fantastic (we unfortunately don't have parts to test against locally).

At least the failed run proved that we should now be able to read the raw return status and result values from the dump though, so not all bad

mrehkopf commented 1 year ago

That doesn't seem to help with detection, but the added commit only seems to add timeout handling? I can see that on 2nd try now it stops spinning on polling 0xe000edf0 so the timeout appears to be working :) Target still freezes on first run so I think it's still crashing. BMDA-lpc17xx-detection-regression_81c798a.log BMDA-lpc17xx-detection-regression_81c798a_2ndrun.log BMDA-lpc17xx-detection-regression_81c798a_after_reset.log

dragonmux commented 1 year ago

The fix we applied with the force push we did should have fixed the IAP stack oops we'd made - so the results of calling the IAP part ID routine should now be valid at least. We'll give the logs a look tonight after we've had some sleep

mrehkopf commented 1 year ago

(for the record, I cloned the repo a second time to rule out any git woes but the build artifacts were binary identical and the outcome is the same)

dragonmux commented 1 year ago

The outcome you saw was expected based on the prior runs in the issue - if you look at the logs and find line 1393 of the initial detection log using 81c798a, you'll see BMP writing an IAP invocation frame to the device, then setting up the registers for the IAP call, and then invoking the call. After, on line 1575 you'll see it reading the result back. The result is an "OK" (first 4 bytes) followed by a whole lot of 0 - which is why reading the part ID is failing - what part ID in the result.

That is what we mean about patching in better diagnostics for the issue, which is what the fix branch currently does by side effect of cleaning up and fixing the code that was already there to eliminate issues like UB stack reads from the set of possible causes.

We've got a dev board for this on order and we'll be diving into it properly once that arrives. Until then we can't really do much as getting an OK from the IAP and no valid data is something BMP can't do anything about. We'll update the issue once we have a board in hand and have been able to sit down and do further diagnostics

mrehkopf commented 1 year ago

Couldn't help but dive in a bit and found that during IAP the LPC1756 hits a HardFault. VCATCH and HALTED are set in DFSR, FORCED is set in HFSR, also MMARVALID and DACCVIOL are set in CFSR, and MMFAR contains 0x00000430. R0 also contains 0x00000430.

cortexm_halt_poll() ends up in cortexm_fault_unwind(). After stack frame recovery the last PC turns out to be 0x1fff1552 so presumably somewhere inside IAP code. I checked LR first but it was properly set to 0x10000001 until immediately before calling the IAP, so it should hit the breakpoint instruction upon returning.

Resetting the MCU manually while halted resulted in proper operation afterwards; it hits the breakpoint properly and the IAP response can be read successfully from 0x10000004.

At this point I'm wondering if my firmware could have anything to do with it since I'm usually trying to attach to a running target... My firmware alters the VTOR register and also sets up an MPU region that would lead to access violations ranging from 0x00000000-0x0000bfff, so an access to 0x00000430 would indeed trigger a fault.

However, I'm wondering what business the IAP would have to try and access 0x00000430 and why VCATCH would be set in the first place. I tried disabling IRQs (by writing 0xffffffff to all ICER, then ICPR registers) and resetting VTOR to 0x00000000 to no avail.

I will try disabling the MPU region next and see what happens.

mrehkopf commented 1 year ago

Indeed, disabling the MPU via target_mem_write32(target, 0xe000ed94, 0); just before running the IAP seems to resolve the issue /o\ Apparently the IAP routine maps ROM to some area starting at address 0 (via the MEMMAP register) and accesses that area freely. :/ Maybe backing up, clearing, and restoring the MPU CTRL register around the IAP call would be an acceptable workaround?

dragonmux commented 1 year ago

Nicely sleuthed! The BMP code for this target is worryingly rudimentary and should probably take care to map and unmap the ROM routines to the scratch space and toggle the MPU - probably via the approach of adding enter/exit Flash mode routines and allocating some scratch space in BMP to store chip state to correctly save/restore it

dragonmux commented 1 year ago

Updated the branch with flash entry/exit routines which disable the MPU and restore its state either side of doing any IAP calls - please give it a whirl and if it works properly now, we'll PR it. Thank you for your patience with this issue.

mrehkopf commented 1 year ago

Awesome, it's working! Target is detected properly and I can attach. It seems like I can't do much more, trying to load firmware.elf or step seems to just lock up. Probably going to play around a bit more ^^ But I think this particular issue is resolved. Thank you for the swift response! This level of support is outstanding and I appreciate it a lot. :3

dragonmux commented 1 year ago

When you figure out the other issue of lockups, do please open a new issue on the tracker with steps to reproduce, etc and we'll look into fixing it for v1.10

blackmagic-debug / blackmagic

LPC17xx detection fails / unreliable #1366