Closed gneworld closed 3 months ago
@gneworld probably it was caused by some function returning -1 to LR, what explain this 0xFFFFFFFF value.
Hi @anchao could you please help? I think you added support for it, right?
@gneworld
try it
Perhaps the above method is not a good method. The reason for this problem is that the idle thread needs to clear lr before calling nxstart, otherwise the unwind backtrace is likely to go wrong because the value of lr
is uncertain before __start
.
Perhaps the above method is not a good method. The reason for this problem is that the idle thread needs to clear lr before calling nxstart, otherwise the unwind backtrace is likely to go wrong because the value of
lr
is uncertain before__start
.
@anjiahao1 but why does -O0 work correctly, while -O3 does not?
Hi @anchao could you please help? I think you added support for it, right?
@acassis @anjiahao1 is more familiar with the details of unwind table backtrace than me. @anjiahao1 @gneworld Aren't you guys on the same floor? why not confirm the issue offline?
The root cause is that at reset, the value of the general register is usually not fixed, and lr needs to be set to 0 before calling nxstart, which requires modifying a lot of arch/chips code
Hi @anchao could you please help? I think you added support for it, right?
@acassis @anjiahao1 is more familiar with the details of unwind table backtrace than me. @anjiahao1 @gneworld Aren't you guys on the same floor? why not confirm the issue offline?
@anchao I think if they can do it at same room is fine, but please don't report the details here to let more people see what was the issue; how the root causes was discovered and why "that new commit" is the right solution :-)
- Seems as expected, you need to confirm with vendor the behavior of RAR(Reset all registers) when lockstep is disabled, which is fixed on design phase.
- Zephyr does something similar, I think initializing the registers is necessary arch: arm: Rewrite Cortex-R reset vector function. zephyrproject-rtos/zephyr#20473
In fact, this is a Cortex-M7 MCU which is the ARM v7M-E architecture. And it does not have LOCKSTEP or RAR configurations. So I think it might not be the DCLS problem.
In fact, this is a Cortex-M7 MCU which is the ARM v7M-E architecture. And it does not have LOCKSTEP or RAR configurations. So I think it might not be the DCLS problem.
DCLS is configurable on Cortex-M7, I just suspect that the case they are facing is a issue on the lock-step core.
In fact, this is a Cortex-M7 MCU which is the ARM v7M-E architecture. And it does not have LOCKSTEP or RAR configurations. So I think it might not be the DCLS problem.
DCLS is configurable on Cortex-M7, I just suspect that the case they are facing is a issue on the lock-step core.
Double checked and it's comfirmed that the LOCKSTEP and RAR are both enabled. And I'm wondering why the function unwind_find_entry(frame->pc) does not return NULL when frame->pc == 0xFFFFFFFC? It's obvious that 0xFFFFFFFC exceeds __exidx_end.
@gneworld probably it was caused by some function returning -1 to LR, what explain this 0xFFFFFFFF value.
Hi @anchao could you please help? I think you added support for it, right?
The reason of the 0xFFFFFFFF being on the stack is that the compiler considers the __start as a normal function and pushes LR in the stack。 And when the core boots from reset, the core will set LR to 0xFFFFFFFF. B1.5.5 of Arm®v7-M Architecture Reference Manual
I'm not sure if the naked
attribute could avoid this issue, which ensures that the unwind extab
does not contain any push content:
diff --git a/include/nuttx/init.h b/include/nuttx/init.h
index af3dce335f..98b9ba68f8 100644
--- a/include/nuttx/init.h
+++ b/include/nuttx/init.h
@@ -98,7 +98,7 @@ EXTERN uint8_t g_nx_initstate; /* See enum nx_initstate_e */
/* OS entry point called by boot logic */
-void nx_start(void);
+void nx_start(void) noreturn_function naked_function;
#undef EXTERN
#ifdef __cplusplus
@anchao Unfortunately, this change cannot solve the problem.
hello all, i ceate a pr fix it https://github.com/apache/nuttx/pull/12787
During the backtrace process, the system freezes, and the anomaly shown in the image above,such as lr is 0xFFFFFFFF and pc is 0xFFFFFFFC
When I add this conditional check, the problem disappeared,so is this a real issue or how should I do to avoid this issue?
Now let's simplify this problem with hello app,I find that when build with -O3, hello app would freezes during backtrace,but if we build it with -O0, hello app works well(with wrong pc value), so do the -funwind-tables have conflicts with -O3 in some cases ?