RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.96k stars 1.99k forks source link

stm32/l152: ztimer Crash after initial flashing #21010

Open crasbe opened 1 week ago

crasbe commented 1 week ago

Description

When programming the tests/periph/adc test program to the NUCLEO-L152RE board with make flash, the program crashes on the first call of ztimer_set. After a power cycle, the program runs without any issues. Likewise, when the program is flashed using the builtin mass storage driver of the ST-Link, the program will start without any issues. Programming the chip with the STM32CubeProgrammer leads to the same result as OpenOCD.

Steps to reproduce the issue

The program I used for testing is tests/periph/adc, maybe it can be reproduced with others well. Optionally you can apply the change described in https://github.com/RIOT-OS/RIOT/issues/20780#issuecomment-2482672894, but the unmodified master version works just as well.

First the ISR stacksize has to be increased, otherwise the printf in the Hard Fault handler won't print out anything. Just add CFLAGS += -DISR_STACKSIZE=1024 to tests/periph/adc/Makefile.

BOARD=nucleo-l152re make flash term Usually the output won't be complete, so hit the reset button:

2024-11-19 15:58:25,032 # main(): This is RIOT! (Version: 2025.01-devel-119-gc636f)
2024-11-19 15:58:25,032 # 
2024-11-19 15:58:25,035 # RIOT ADC peripheral driver test
2024-11-19 15:58:25,035 # 
2024-11-19 15:58:25,041 # This test will sample all available ADC lines once every 100ms with
2024-11-19 15:58:25,046 # a 10-bit resolution and print the sampled results to STDIO
2024-11-19 15:58:25,046 # 
2024-11-19 15:58:25,047 # 
2024-11-19 15:58:25,050 # Successfully initialized ADC_LINE(0)
2024-11-19 15:58:25,053 # Successfully initialized ADC_LINE(1)
2024-11-19 15:58:25,056 # Successfully initialized ADC_LINE(2)
2024-11-19 15:58:25,060 # Successfully initialized ADC_LINE(3)
2024-11-19 15:58:25,063 # Successfully initialized ADC_LINE(4)
2024-11-19 15:58:25,066 # Successfully initialized ADC_LINE(5)
2024-11-19 15:58:25,069 # ADC_LINE(0): 18 25 28 30 -1 -1
2024-11-19 15:58:25,072 # ADC_LINE(1): 24 28 30 31 -1 -1
2024-11-19 15:58:25,075 # ADC_LINE(2): 63 63 63 63 -1 -1
2024-11-19 15:58:25,078 # ADC_LINE(3): 25 27 29 30 -1 -1
2024-11-19 15:58:25,081 # ADC_LINE(4): 23 27 30 31 -1 -1
2024-11-19 15:58:25,084 # ADC_LINE(5): 25 29 31 31 -1 -1
2024-11-19 15:58:25,184 # 
2024-11-19 15:58:25,186 # Context before hardfault:
2024-11-19 15:58:25,188 #    r0: 0x00000002
2024-11-19 15:58:25,189 #    r1: 0x20000aa8
2024-11-19 15:58:25,191 #    r2: 0x00000001
2024-11-19 15:58:25,193 #    r3: 0x00000000
2024-11-19 15:58:25,194 #   r12: 0x20000408
2024-11-19 15:58:25,196 #    lr: 0x0800106b
2024-11-19 15:58:25,197 #    pc: 0x0878306a
2024-11-19 15:58:25,199 #   psr: 0x6100000e
2024-11-19 15:58:25,199 # 
2024-11-19 15:58:25,200 # FSR/FAR:
2024-11-19 15:58:25,202 #  CFSR: 0x00000100
2024-11-19 15:58:25,203 #  HFSR: 0x40000000
2024-11-19 15:58:25,205 #  DFSR: 0x00000008
2024-11-19 15:58:25,206 #  AFSR: 0x00000000
2024-11-19 15:58:25,207 # Misc
2024-11-19 15:58:25,209 # EXC_RET: 0xfffffff1
2024-11-19 15:58:25,212 # Hard fault occurred in ISR number -2
2024-11-19 15:58:25,216 # Attempting to reconstruct state for debugging...
2024-11-19 15:58:25,217 # In GDB:
2024-11-19 15:58:25,219 #   set $pc=0x878306a
2024-11-19 15:58:25,220 #   frame 0
2024-11-19 15:58:25,220 #   bt
2024-11-19 15:58:25,222 # *** RIOT kernel panic:
2024-11-19 15:58:25,224 # HARD FAULT HANDLER
2024-11-19 15:58:25,224 # 
2024-11-19 15:58:25,225 # *** halted.
2024-11-19 15:58:25,225 # 
2024-11-19 15:58:25,227 # Inside isr -13

With the ztimer ENABLE_DEBUG flag enabled in sys/ztimer/core.c, the output becomes the following:

2024-11-19 15:47:21,679 # ztimer_set(): 0x20000408: set 0x20000bd0 at 93 offset 2147483000
2024-11-19 15:47:21,686 # clock 0x20000408: _ztimer_update_head_offset(): diff=5924 old head 0
2024-11-19 15:47:21,690 # _add_entry_to_list() 0x20000bd0 offset 2147483000
2024-11-19 15:47:21,695 # main(): This is RIOT! (Version: 2025.01-devel-119-gc636f)
2024-11-19 15:47:21,695 # 
2024-11-19 15:47:21,698 # RIOT ADC peripheral driver test
2024-11-19 15:47:21,698 # 
2024-11-19 15:47:21,704 # This test will sample all available ADC lines once every 100ms with
2024-11-19 15:47:21,709 # a 10-bit resolution and print the sampled results to STDIO
2024-11-19 15:47:21,709 # 
2024-11-19 15:47:21,709 # 
2024-11-19 15:47:21,713 # Successfully initialized ADC_LINE(0)
2024-11-19 15:47:21,716 # Successfully initialized ADC_LINE(1)
2024-11-19 15:47:21,719 # Successfully initialized ADC_LINE(2)
2024-11-19 15:47:21,723 # Successfully initialized ADC_LINE(3)
2024-11-19 15:47:21,726 # Successfully initialized ADC_LINE(4)
2024-11-19 15:47:21,729 # Successfully initialized ADC_LINE(5)
2024-11-19 15:47:21,732 # ADC_LINE(0): 17 25 28 30 -1 -1
2024-11-19 15:47:21,735 # ADC_LINE(1): 24 28 30 31 -1 -1
2024-11-19 15:47:21,738 # ADC_LINE(2): 63 63 63 63 -1 -1
2024-11-19 15:47:21,741 # ADC_LINE(3): 25 27 29 30 -1 -1
2024-11-19 15:47:21,744 # ADC_LINE(4): 22 27 29 31 -1 -1
2024-11-19 15:47:21,747 # ADC_LINE(5): 25 29 30 31 -1 -1
2024-11-19 15:47:21,752 # ztimer_set(): 0x20000ba8: set 0x20000a10 at 73 offset 100
2024-11-19 15:47:21,757 # ztimer_now() checkpoint=0 lower_last=0 lower_now=78 diff=78
2024-11-19 15:47:21,760 # ztimer_now() returning 78
2024-11-19 15:47:21,766 # clock 0x20000ba8: _ztimer_update_head_offset(): diff=78 old head 0
2024-11-19 15:47:21,769 # _add_entry_to_list() 0x20000a10 offset 100
2024-11-19 15:47:21,775 # ztimer_set(): 0x20000408: set 0x20000bd0 at 96262 offset 100000
2024-11-19 15:47:21,782 # clock 0x20000408: _ztimer_update_head_offset(): diff=96075 old head 0x20000bd0
2024-11-19 15:47:21,791 # ztimer 0x20000408: _ztimer_update_head_offset(): now=101999 new head 0x20000bd0 offset 2147386925
2024-11-19 15:47:21,793 # _del_entry_from_list()
2024-11-19 15:47:21,797 # _add_entry_to_list() 0x20000bd0 offset 100000
2024-11-19 15:47:21,897 # 
2024-11-19 15:47:21,899 # Context before hardfault:
2024-11-19 15:47:21,901 #    r0: 0x00000002
2024-11-19 15:47:21,902 #    r1: 0x20000aa8
2024-11-19 15:47:21,904 #    r2: 0x00000001
2024-11-19 15:47:21,906 #    r3: 0x00000000
2024-11-19 15:47:21,907 #   r12: 0x0000002e
2024-11-19 15:47:21,909 #    lr: 0x0800106b
2024-11-19 15:47:21,911 #    pc: 0x0878306a
2024-11-19 15:47:21,912 #   psr: 0x6100000e
2024-11-19 15:47:21,912 # 
2024-11-19 15:47:21,913 # FSR/FAR:
2024-11-19 15:47:21,915 #  CFSR: 0x00000100
2024-11-19 15:47:21,916 #  HFSR: 0x40000000
2024-11-19 15:47:21,918 #  DFSR: 0x00000008
2024-11-19 15:47:21,920 #  AFSR: 0x00000000
2024-11-19 15:47:21,920 # Misc
2024-11-19 15:47:21,922 # EXC_RET: 0xfffffff1
2024-11-19 15:47:21,925 # Hard fault occurred in ISR number -2
2024-11-19 15:47:21,929 # Attempting to reconstruct state for debugging...
2024-11-19 15:47:21,930 # In GDB:
2024-11-19 15:47:21,932 #   set $pc=0x878306a
2024-11-19 15:47:21,933 #   frame 0
2024-11-19 15:47:21,933 #   bt
2024-11-19 15:47:21,935 # *** RIOT kernel panic:
2024-11-19 15:47:21,937 # HARD FAULT HANDLER
2024-11-19 15:47:21,937 # 
2024-11-19 15:47:21,938 # *** halted.
2024-11-19 15:47:21,938 # 
2024-11-19 15:47:21,940 # Inside isr -13

After a power cycle, this is the resulting output:

2024-11-19 15:52:09,369 # _add_entry_to_ztimer_set(): 0x20000408: set 0x20000bd0 at 93 offset 2147483000
2024-11-19 15:52:09,375 # clock 0x20000408: _ztimer_update_head_offset(): diff=5924 old head 0
2024-11-19 15:52:09,379 # _add_entry_to_list() 0x20000bd0 offset 2147483000
2024-11-19 15:52:09,384 # main(): This is RIOT! (Version: 2025.01-devel-119-gc636f)
2024-11-19 15:52:09,385 # 
2024-11-19 15:52:09,388 # RIOT ADC peripheral driver test
2024-11-19 15:52:09,388 # 
2024-11-19 15:52:09,394 # This test will sample all available ADC lines once every 100ms with
2024-11-19 15:52:09,399 # a 10-bit resolution and print the sampled results to STDIO
2024-11-19 15:52:09,399 # 
2024-11-19 15:52:09,399 # 
2024-11-19 15:52:09,402 # Successfully initialized ADC_LINE(0)
2024-11-19 15:52:09,406 # Successfully initialized ADC_LINE(1)
2024-11-19 15:52:09,409 # Successfully initialized ADC_LINE(2)
2024-11-19 15:52:09,412 # Successfully initialized ADC_LINE(3)
2024-11-19 15:52:09,415 # Successfully initialized ADC_LINE(4)
2024-11-19 15:52:09,418 # Successfully initialized ADC_LINE(5)
2024-11-19 15:52:09,421 # ADC_LINE(0): 17 25 28 30 -1 -1
2024-11-19 15:52:09,424 # ADC_LINE(1): 24 28 30 31 -1 -1
2024-11-19 15:52:09,428 # ADC_LINE(2): 63 63 63 63 -1 -1
2024-11-19 15:52:09,430 # ADC_LINE(3): 26 28 29 30 -1 -1
2024-11-19 15:52:09,433 # ADC_LINE(4): 22 27 29 31 -1 -1
2024-11-19 15:52:09,436 # ADC_LINE(5): 27 29 30 31 -1 -1
2024-11-19 15:52:09,442 # ztimer_set(): 0x20000ba8: set 0x20000a10 at 73 offset 100
2024-11-19 15:52:09,446 # ztimer_now() checkpoint=0 lower_last=0 lower_now=78 diff=78
2024-11-19 15:52:09,449 # ztimer_now() returning 78
2024-11-19 15:52:09,456 # clock 0x20000ba8: _ztimer_update_head_offset(): diff=78 old head 0
2024-11-19 15:52:09,459 # _add_entry_to_list() 0x20000a10 offset 100
2024-11-19 15:52:09,465 # ztimer_set(): 0x20000408: set 0x20000bd0 at 96262 offset 100000
2024-11-19 15:52:09,472 # clock 0x20000408: _ztimer_update_head_offset(): diff=96075 old head 0x20000bd0
2024-11-19 15:52:09,480 # ztimer 0x20000408: _ztimer_update_head_offset(): now=101999 new head 0x20000bd0 offset 2147386925
2024-11-19 15:52:09,482 # _del_entry_from_list()
2024-11-19 15:52:09,487 # _add_entry_to_list() 0x20000bd0 offset 100000
2024-11-19 15:52:09,590 # ztimer_handler(): 0x20000408 now=223913
2024-11-19 15:52:09,594 # 0x20000408:101999(101999)->0x20000bd0:100000(201999)
2024-11-19 15:52:09,597 # ztimer_handler(): no checkpointing
2024-11-19 15:52:09,603 # ztimer_handler(): trigger 0x20000bd0->0 at 235427
2024-11-19 15:52:09,606 # ztimer_handler(): 0x20000ba8 now=239
2024-11-19 15:52:09,609 # 0x20000ba8:78(78)->0x20000a10:100(178)
2024-11-19 15:52:09,614 # ztimer_now() checkpoint=78 lower_last=78 lower_now=246 diff=168
2024-11-19 15:52:09,617 # ztimer_now() returning 246
2024-11-19 15:52:09,621 # ztimer_handler(): 0x20000ba8 diff=-68
2024-11-19 15:52:09,625 # ztimer_handler(): trigger 0x20000a10->0 at 258
2024-11-19 15:52:09,630 # ztimer_now() checkpoint=246 lower_last=246 lower_now=262 diff=16
2024-11-19 15:52:09,633 # ztimer_now() returning 262
2024-11-19 15:52:09,639 # clock 0x20000ba8: _ztimer_update_head_offset(): diff=84 old head 0
2024-11-19 15:52:09,645 # ztimer_set(): 0x20000408: set 0x20000bd0 at 276824 offset 2147483000
2024-11-19 15:52:09,652 # clock 0x20000408: _ztimer_update_head_offset(): diff=81002 old head 0
2024-11-19 15:52:09,656 # _add_entry_to_list() 0x20000bd0 offset 2147483000
2024-11-19 15:52:09,658 # 0x20000ba8:262(262)!
2024-11-19 15:52:09,661 # ztimer_handler(): 0x20000ba8 done.
2024-11-19 15:52:09,668 # clock 0x20000408: _ztimer_update_head_offset(): diff=15783 old head 0x20000bd0
2024-11-19 15:52:09,676 # ztimer 0x20000408: _ztimer_update_head_offset(): now=298784 new head 0x20000bd0 offset 2147467217
2024-11-19 15:52:09,682 # 0x20000408:298784(298784)->0x20000bd0:2147467217(2147766001)
2024-11-19 15:52:09,685 # ztimer_handler(): 0x20000408 done.
2024-11-19 15:52:09,688 # ADC_LINE(0): 25 28 30 31 -1 -1
2024-11-19 15:52:09,692 # ADC_LINE(1): 29 30 31 32 -1 -1
2024-11-19 15:52:09,694 # ADC_LINE(2): 63 63 63 63 -1 -1
2024-11-19 15:52:09,698 # ADC_LINE(3): 32 32 32 32 -1 -1
2024-11-19 15:52:09,700 # ADC_LINE(4): 28 30 31 31 -1 -1
2024-11-19 15:52:09,703 # ADC_LINE(5): 25 29 30 31 -1 -1
...

It looks like the program crashes either in _add_entry_to_list() or in the ztimer_handler() function.

Maybe someone a bit more familiar with ztimer can have a look at this issue. My theory is that some register is not properly reset. I don't know which hardware timer ztimer uses on the STM32L152ZE, but there is a remark in the reference manual [1] on page 165, section 6.3.14:

Note: The LSEON, LSEBYP, RTCSEL and RTCEN bits in the RCC control and status register
(RCC_CSR) are in the RTC domain. As these bits are write protected after reset, the DBP
bit in the Power control register (PWR_CR) has to be set to be able to modify them. Refer to
Section RTC and RTC backup registers for further information. These bits are only reset
after a RTC domain reset (see RTC and backup registers reset). Any internal or external
reset does not have any effect on them.

Does ztimer use the RTC or relies on the LSE-registers? Trying to access the write protected RTC registers would surely lead to a Hard Fault?

Expected results

Regardless of the flashing method, the program should not crash when using ztimer.

Actual results

The program crashes.

Versions

Operating System Environment
----------------------------
         Operating System: "Linux Mint" "21.3 (Virginia)"
                   Kernel: Linux 5.15.0-125-generic x86_64 x86_64
             System shell: /usr/bin/dash (probably dash)
             make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (15:10.3-2021.07-4) 10.3.1 20210621 (release)
                  avr-gcc: avr-gcc (GCC) 5.4.0
           msp430-elf-gcc: missing
       riscv-none-elf-gcc: missing
  riscv64-unknown-elf-gcc: missing
      riscv32-esp-elf-gcc: missing
     xtensa-esp32-elf-gcc: missing
   xtensa-esp32s2-elf-gcc: missing
   xtensa-esp32s3-elf-gcc: missing
   xtensa-esp8266-elf-gcc: missing
                    clang: missing

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.3.0"
        msp430-elf-newlib: missing
    riscv-none-elf-newlib: missing
riscv64-unknown-elf-newlib: missing
   riscv32-esp-elf-newlib: missing
  xtensa-esp32-elf-newlib: missing
xtensa-esp32s2-elf-newlib: missing
xtensa-esp32s3-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
                 avr-libc: "2.0.0" ("20150208")

Installed development tools
---------------------------
                   ccache: missing
                    cmake: missing
                 cppcheck: missing
                  doxygen: 1.9.1
                      git: git version 2.34.1
                     make: GNU Make 4.3
                  openocd: xPack Open On-Chip Debugger 0.12.0+dev-01685-gb9224c0c0-dirty (2024-08-02-19:50)
                   python: missing
                  python2: missing
                  python3: Python 3.10.12
                   flake8: 4.0.1 (mccabe: 0.6.1, pycodestyle: 2.8.0, pyflakes: 2.4.0) CPython 3.10.12 on
               coccinelle: spatch version 1.1.1 compiled with OCaml version 4.13.1

Note: I updated OpenOCD to see if that was the cause of the issue, but the same happens with OpenOCD 0.11 and 0.12.

[1] https://www.st.com/resource/en/reference_manual/cd00240193-stm32l100xx-stm32l151xx-stm32l152xx-and-stm32l162xx-advanced-arm-based-32-bit-mcus-stmicroelectronics.pdf

crasbe commented 1 week ago

Okay, so it appears that the Hard Fault happens in/before ztimer_handler() is called. The _add_entry_to_list function is executed completely and then the Hard Fault happens.

What's odd is that ISR -2 would be the PendSV_ISR, which is used for the scheduler. That means that the HardFault happened inside of the isr_pendsv function, which is in cpu/cortexm_common/thread_arch.c.

The STM32CubeProgrammer has a new feature called "Fault Analyzer" and it says the fault happened at address 0x8000A93. Bild_2024-11-19_234207049

With GDB I decompiled the binary and tadaaaa:

(gdb) x/16ib 0x8000A80
   0x8000a80 <isr_svc+4>:       ldr     r3, [pc, #4]    ; (0x8000a88 <isr_svc+12>)
   0x8000a82 <isr_svc+6>:       str     r2, [r3, #4]
   0x8000a84 <isr_svc+8>:       bx      lr
   0x8000a86 <isr_svc+10>:      nop
   0x8000a88 <isr_svc+12>:      stc     0, cr14, [r0, #-0]
   0x8000a8c <sched_arch_idle>: push    {r3, lr}
   0x8000a8e <sched_arch_idle+2>:       bl      0x8001608 <pm_set_lowest>
   0x8000a92 <sched_arch_idle+6>:       cpsie   i
   0x8000a94 <sched_arch_idle+8>:       isb     sy
   0x8000a98 <sched_arch_idle+12>:      cpsid   i
   0x8000a9a <sched_arch_idle+14>:      pop     {r3, pc}
   0x8000a9c <hard_fault_default>:      ldr     r3, [pc, #52]   ; (0x8000ad4 <hard_fault_default+56>)
   0x8000a9e <hard_fault_default+2>:    ldr     r2, [pc, #56]   ; (0x8000ad8 <hard_fault_default+60>)
   0x8000aa0 <hard_fault_default+4>:    ldr     r1, [pc, #56]   ; (0x8000adc <hard_fault_default+64>)
   0x8000aa2 <hard_fault_default+6>:    mov     r0, sp
   0x8000aa4 <hard_fault_default+8>:    cmp     r0, r2

The instruction where the Hard Fault happens is cpsie i.

The fault that happened is "IBUSERR", which means the instruction prefetch failed. Looking at the RIOT trace, this makes sense, since 0x08783282 is an illegal address. However I don't know how that's related to the cpsie i instruction (or anything around it).

ztimer_set(): After ztimer_update
ztimer_set(): After ztimer_update
sched_set_status: removing thread 1 from runqueue hu.

Context before hardfault:
   r0: 0x00000002
   r1: 0x200004a4
   r2: 0x00000001
   r3: 0x00000000
  r12: 0x00000036
   lr: 0x08001283
   pc: 0x08783282
  psr: 0x6100000e

FSR/FAR:
 CFSR: 0x00000100
 HFSR: 0x40000000
 DFSR: 0x00000008
 AFSR: 0x00000000
Misc
EXC_RET: 0xfffffff1
Hard fault occurred in ISR number -2
Attempting to reconstruct state for debugging...
In GDB:
  set $pc=0x8783282
  frame 0
  bt
*** RIOT kernel panic:
HARD FAULT HANDLER

*** halted.

Inside isr -13

This is pretty much at the end of my knowledge (actually beyond that 😅 ).