blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.
GNU General Public License v3.0
3.23k stars 768 forks source link

SWD regression on scan targets(SW-DP scan failed!) #1001

Closed zvova7890 closed 2 years ago

zvova7890 commented 2 years ago

Recently I have updated my bmp(on stm32f1) board to latest revision and found bad news for me. My project board not recognized anymore. It has ~40cm SWD wire in not best noise environment. I assume that the problem is related to these nuances.

Target voltage: 3.22V
SW-DP scan failed!
Failed

I'm started bisecting commits and after long time magical behavior I found broken commits. It is 61efe26348b8d4d3ae01c3b2ac68680a4ef68d89 and 0c6390307115b48e57deeecf0be8f5eca89604ac

61efe26348b8d4d3ae01c3b2ac68680a4ef68d89 - commit(as I'm understand) reduced count of try to detect target. 0c6390307115b48e57deeecf0be8f5eca89604ac - commit has reduced timeout from 2000 -> 20.

Partially revert noticed commits restore my board recognition, and works fine.

diff --git a/src/target/adiv5_jtagdp.c b/src/target/adiv5_jtagdp.c
index cf8591b..0ea4e3d 100644
--- a/src/target/adiv5_jtagdp.c
+++ b/src/target/adiv5_jtagdp.c
@@ -85,7 +85,7 @@ uint32_t fw_adiv5_jtagdp_low_access(ADIv5_DP_t *dp, uint8_t RnW,

    jtag_dev_write_ir(&jtag_proc, dp->dp_jd_index, APnDP ? IR_APACC : IR_DPACC);

-   platform_timeout_set(&timeout, 20);
+   platform_timeout_set(&timeout, 2000);
    do {
        jtag_dev_shift_dr(&jtag_proc, dp->dp_jd_index, (uint8_t*)&response,
                          (uint8_t*)&request, 35);
diff --git a/src/target/adiv5_swdp.c b/src/target/adiv5_swdp.c
index 7797b8f..4f8682b 100644
--- a/src/target/adiv5_swdp.c
+++ b/src/target/adiv5_swdp.c
@@ -227,13 +227,15 @@ uint32_t firmware_swdp_low_access(ADIv5_DP_t *dp, uint8_t RnW,

    if((addr & ADIV5_APnDP) && dp->fault) return 0;

-   platform_timeout_set(&timeout, 20);
+   platform_timeout_set(&timeout, 2000);
    do {
        dp->seq_out(request, 8);
        ack = dp->seq_in(3);
        if (ack == SWDP_ACK_FAULT) {
-           dp->fault = 1;
-           return 0;
+           /* On fault, abort() and repeat the command once.*/
+           dp->error(dp);
+           dp->seq_out(request, 8);
+           ack = dp->seq_in(3);
        }
    } while (ack == SWDP_ACK_WAIT && !platform_timeout_is_expired(&timeout));
dragonmux commented 2 years ago

Thank you for reporting the issue!

We have a couple of questions to help us better understand the problem space and so how best to fix it: First off, what is the target micro your project board uses? And second, what happens if instead of 20 or 2000, you use a timeout value of something like 100? Is that sufficient to get things working? (would like to avoid re-introducing another bug in the process of fixing yours)

zvova7890 commented 2 years ago

Hi! Thanks for attention!

First off, what is the target micro your project board uses?

STM32F405

And second, what happens if instead of 20 or 2000, you use a timeout value of something like 100

Target detects, but not stable and after detect works incorrectly in 80% of cases

Target voltage: 3.20V
Available Targets:
No. Att Driver
 1      STM32F40x M4
>> long time takes here <<
0x00000000 in ?? ()  <-- incorrect current code position

would like to avoid re-introducing another bug in the process of fixing yours

This commit introduced for speedup some processes of communication, as I'm understand, but it is works only with short wires. This new part of code will make one try of detect, instead of couple iterations

    platform_timeout_set(&timeout, 20);
    do {
        dp->seq_out(request, 8);
        ack = dp->seq_in(3);
        if (ack == SWDP_ACK_FAULT) {
            dp->fault = 1;
            return 0;
        }
    } while (ack == SWDP_ACK_WAIT && !platform_timeout_is_expired(&timeout));
dragonmux commented 2 years ago

hmm.. does a value of 250 work better then? (We'd prefer not putting it back to 2000 as Uwe most likely had a good reason for another target why this was a bad idea, so if we can get your target reliably working at a lower value, this would be preferable as a fix)

As for the retry bit - yeah, that seems daft - as long as there is a way to always break out of the loop, having some amount of retrying happen there makes sense. One thing we might do is add a retry counter and change the do-while loop into a for loop so we can guarantee escape after an amount of time, but provide robustness for situations like yours.

zvova7890 commented 2 years ago

250 better, I see no problems.

dragonmux commented 2 years ago

We've created a fix (see the linked PR) - please try it and let us know if you have any further problems.

We attempted to revert the change to SWD fault handling but that actually broke the target we were using for test. This suggests some co-dependant changes since so hopefully increasing the timeouts is enough for your use case

zvova7890 commented 2 years ago

Increase timeout is useless in this case because do-while loop is not working. First SWDP_ACK_FAULT cause immediately return. I can suggest this approach:

    platform_timeout_set(&timeout, 250);
    do {
        dp->seq_out(request, 8);
        ack = dp->seq_in(3);
        if (ack == SWDP_ACK_FAULT) {
            /* On fault, abort() and repeat the command once.*/
            dp->error(dp);
        }
    } while ((ack == SWDP_ACK_WAIT || ack == SWDP_ACK_FAULT) &&
            !platform_timeout_is_expired(&timeout));

    if (ack == SWDP_ACK_WAIT) {
        dp->abort(dp, ADIV5_DP_ABORT_DAPABORT);
        dp->fault = 1;
        return 0;
    }

    if(ack == SWDP_ACK_FAULT) {
        dp->fault = 1;
        return 0;
    }

Also I remember why 2s timeout is good approach. It can be useful when chip "stuck" at start on some code and unreachable for SWD, but we have small chance "catch" state that before fatal code. So, we run gdb and push reset button simultaneously with hope that BMP catch controller. I understand this is crazy thing, but sometimes help :) Also I can't place in my mind any idea in what case bigger timeout can be worse for SWD.

UweBonnes commented 2 years ago

Is WFI() use for some longer period? Please try something like #973 for the F1

dragonmux commented 2 years ago

WFI hasn't been mentioned here, Uwe.. and the user already isolated the problematic code in BMP.

@zvova7890 - we though of a not dissimilar approach but anything like that ultimately stuffs the ability for BMP to correctly detect TM4C parts via SWD. This needs further consideration so we don't break TM4C but we also unbreak your target.

dragonmux commented 2 years ago

Ok, having poked about further.. it seems that the borkage we saw was from introduction to a retry limiting counter. Without that and just the slightly higher timeout value, TM4C detection still works. @zvova7890 could you please test the PR now and see if it works right so we can move to get the fix merged?

To answer why 2s timeout is bad: This is too high for a target like the Tiva-C which has to see several SWD failures due to various probe routines before we see a success, resulting in BMP not being able to do SWD scan with such devices.

We know a couple of people with RP2040's and we'll ask them to test that the PR linked doesn't break access to their devices if they can - the reason for this being this change partially reverts a change made to support these devices, but which clearly breaks things in your situation.

zvova7890 commented 2 years ago

Seems good for me now

zvova7890 commented 2 years ago

Is WFI() use for some longer period? Please try something like #973 for the F1

Yes, but I'm talking about case when bmp attaching to the device that already in WFI. Anyway, I'm tried #973 for case when bmp already attached before WFI, seems patch works for my F4 target and gdb continue working with code that uses WFI.

Also, about WFI patch, I'm using gdb to check is my controller go to sleep(if can't attach or break executing - my code works) :) I didn't search proper way to do that, it is first what come to my mind. Maybe I'm not alone with that, and control WFI by bmp must be optional somehow?