Debug Description: Programmatic indication of an error happening while __errorcontrol is set to ignore errors

jreineckearm commented 2 weeks ago

The CMSIS debug description comes with a mechanism to instruct the debugger to ignore access errors. It uses a bit in the __errorcontrol debug access variable to activate/deactivate that mode. See https://open-cmsis-pack.github.io/Open-CMSIS-Pack-Spec/main/html/debug_description.html#DebugVars

Historically, this mechanism was designed for write accesses. For example, depending on the HW design, a write to AIRCR.SYSRESETREQ could cause a temporary debug connection loss which ended in a protocol error because the SWDIO and SWDCLK lines weren't driven as expected.

Since then, we've seen this mechanism being used more and more for polling the device state while secured bootloaders run. During such a phase, debug may be turned off to protected sensitive assets. Depending on the polled device state register (could be a single bit), the lack of a well-defined return value for read functions in case of an error, or even better a state variable to indicate success of the last target access becomes more and more of an issue. This gap needs to be closed.

Given that a return value (instead of a data value) may cause new corner cases where the mechanism fails (one device requires a single bit check for 0 where the other requires a 1), a proper success indicator is preferred. There are a couple approaches to the latter:

Reuse currently reserved bits in __errorcontrol. This may however become a fiddly with bitwise operations.
Introduce a debug access variable __errorstate or __lasterror indicating success of the last called debug access function.
- This would allow more detailed error codes.

Happy to work on a proposal for this and open a docs PR for discussion.

MarianSavchuk commented 1 week ago

It seems that I am one of those who asked about this feature. I have a few thoughts on this topic:

Firstly, I would like to mention that there is already a such variable in place, namely the __Result variable. If Debug access functions set its value to -1 in case of an error, this would be enough to handle erroneous results properly.

Secondly, I have observed that during the execution of debug access sequences, the DHCSR register is periodically polled. This sometimes leads to unexpected results. For instance, a system reset is performed and then DebugPortSetup is executed. Under certain circumstances, an error message appears when executing one of the DAP_SWJ_Sequencefunctions, which is a bit misleading. Thus, it would be great to have a variable that could control (and prevent) such polling of the DHCSR register.

Best regards.

jreineckearm commented 1 week ago

Thanks, @MarianSavchuk , for sharing these thoughts!

Firstly, I would like to mention that there is already a such variable in place, namely the __Result variable. If Debug access functions set its value to -1 in case of an error, this would be enough to handle erroneous results properly.

Thanks for highlighting the __Result variable. It was originally introduced to allow a sequence to report errors back to the debugger that calls sequences. But it might become our way out of the dilemma of introducing new variables and by that break compatibility with existing debugger implementations that do not expect to see usage of new variables "undefined" at the beginning of a sequence.

The more I think about it, the more I like the idea. One way to repurpose it without breaking existing sequences could be to explicitly request writing target access errors to __Result through a new control bit in __errorcontrol. A sequence can then make a conscious decision whether to make use of it. And existing sequences wouldn't break because of escalating (from their point of view) unexpected target access errors in __Result.

It imposes a little extra effort on sequences making use of the feature. In the sense of being responsible for clearing __Result if errors were expected in such scenarios, or can be safely handled. But I think this is acceptable.

Also, there is the loophole of older debugger implementations not honoring the new control bit. We might need a read-only capability bit in __errorcontrol or connection indicating availability of the new feature.

What do you think, @MarianSavchuk ?

I'll digest the above a little further and will turn it into a less verbose proposal. Two things I'd like to spend some more thought on are:

Should functions that return error codes as return values already today also populate __Result if the new bit is enabled. My initial thought would be: no, a sequence could map such codes to __Result if needed.
Should setting the new control bit without setting the skip-errors bit immediately abort the sequence execution as before. Or should it enable a "more relaxed" execution where a sequence can recover from debug function errors. This may have some charm...but it probably is a clearer definition if skip-errors has to be explicitly set to achieve that.

Thanks again for injecting that thought! I think this brings us much closer to a solution. :-)

Thus, it would be great to have a variable that could control (and prevent) such polling of the DHCSR register.

I need to have a look at this in our implementation in uVision. I agree that particularly around reset recovery and recovery from connection losses, debug sequences shouldn't be interrupted by periodic DHCSR reads. This sounds rather like a defect in the uVision implementation (we need to see if we can reproduce this easily) and out of scope of this CMSIS debug description specification issue. Having said that, we should add some recommendations to the spec about what a debugger should avoid in certain scenarios to avoid unpredictable/unexpected behavior. I'll separate this into a new issue here on Github.

jreineckearm commented 1 week ago

See #327 for the second part of above thoughts.

Open-CMSIS-Pack / Open-CMSIS-Pack-Spec

Debug Description: Programmatic indication of an error happening while __errorcontrol is set to ignore errors #325