Closed fjmolinas closed 7 months ago
Random bystander here: it is possible you're hitting the same STM32 WFI bug that I've been hitting. I've got a workaround you might try.
Short version: put an isb
after your wfi
.
Long version: https://cliffle.com/blog/stm32-wfi-bug/
Hope that helps!
Description
This issue wants to document a recurrent issue that has been seen on
stm32l152re
platforms.The history so far:
When #7385 was introduced when going to sleep (i.e. calling
__WFI()
),irq_enable
was changed toirq_restore
and that brokestm32l152re
.8518 fixed the issue by introducing a
__NOP()
after__WFI()
. This fixed the issue until #11159 where the call tocortexm_sleep
was changed and now__NOP()
didn't fix the issue but instead somehow triggered it.11159 re-introduced the issue by changing the way
pm_set_lowest()
was called sincepm_set()
was now implemented forSTM32L1
. A single__NOP()
did not fix the issue anymore.In #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when
DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
where enabled. By default openocd sets these bits after anexamine-end
event. This is done by default for all stm32 boards.In #11919 Since the problem was the branch, with 5d96127
irq_restore
was replaced by__set_PRIMASK(state);
which inlines the function call avoiding the jump and the whole issue all together.13999 inline the implementation of
irq_restore
so the fix in #11919 will be removed.The faulty scenario
So from debugging output at some point after wake-up the
pc
gets corrupted and un-reachable instructions gets executed. This only happens whenDBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP
are set. An example for the debugging output is here:Hints to cause
Looking around in
stm32
andcortex-m3
erratas and datasheet. I found a mention of a similar issue withstm32f4
in this ERRATA section 2.1.3. In this errata there are some hints as issue that happen the WFE/WFI are placed at 4 byte alignment and problems with the pref-etch buffer. Although this is a differtentcpu (cortex-m4)
, it made me snoop around the pref-etch buffer and made me think a similar issue might be happening oncortex-m3
In the case of
cortex-m3
the pref-etch buffer can fetch two 32bits instructions or 4 16bits instructions but only in sequential code execution. In our code thePRFTEN
andACC64
are enabled so we are reading 64 bits at a time. It stm32l1xxx reference manual it is stated:This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled.
Steps to reproduce the issue
This issue does not show up currently in master unless a single
__NOP
is added after https://github.com/fjmolinas/RIOT/blob/e7a1b40cde17dc5f407c9b3884a2603ab656ac7e/cpu/cortexm_common/include/cpu.h#L172. This may change depending on current master, since for a while a single__NOP()
fixed the issue.Expected results
No crash ever..
Actual results
Has crashed in the past.
Possible FIXES
If the issue shows up again 3
__NOP
could fix the issue.