RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.87k stars 1.98k forks source link

pkg/semtech-loramac: _lock_acquire called in interrupt context causes kernel panic on esp32-ttgo-t-beam and possibly other esp boards #20799

Open FlapKap opened 1 month ago

FlapKap commented 1 month ago

Description

trying to otaa join using the tests/pkg/semtech-loramac test results in a kernel panic due to a failed assertion in the esp sdk

relevant snippet from the "actual results" section below

2024-07-29 17:00:25,931 # cpu/esp_common/syscalls.c:195 => *** RIOT kernel panic:
2024-07-29 17:00:25,931 # FAILED ASSERTION.

This error also happens in riot version 2024.04 and 2024.01

Steps to reproduce the issue

Assuming the esp build environment is set up with . dist/tools/esptools/export.sh all build and run the test using BOARD=esp32-ttgo-t-beam make -C tests/pkg/semtech-loramac/ all flash term. Then execute loramac join otaa in the shell.

Expected results

That it fails to join since we haven't provided any keys. However it will give the same result even if we give correct keys.

Actual results

Welcome to pyterm!
Type '/exit' to exit.
2024-07-29 17:00:21,286 # Pro cpu up.
2024-07-29 17:00:21,286 # Single core mode
2024-07-29 17:00:21,287 # 
2024-07-29 17:00:21,287 # main(): This is RIOT! (Version: 2024.10-devel-17-gd2fa0)
2024-07-29 17:00:21,287 # All up, running the shell now
> loramac join otaa
2024-07-29 17:00:22,878 # loramac join otaa
2024-07-29 17:00:25,931 # cpu/esp_common/syscalls.c:195 => *** RIOT kernel panic:
2024-07-29 17:00:25,931 # FAILED ASSERTION.
2024-07-29 17:00:25,931 # 
2024-07-29 17:00:25,942 #       pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current     
2024-07-29 17:00:25,953 #         1 | esp_timer            | sleeping _ |   2 |   3640 (  456) ( 3184) | 0x3ffb4590 | 0x3ffb5200 
2024-07-29 17:00:25,954 #         2 | idle                 | running  Q |  15 |   2048 (  452) ( 1596) | 0x3ffb18f4 | 0x3ffb1f30 
2024-07-29 17:00:25,965 #         3 | main                 | bl rx    _ |   7 |   3584 (  996) ( 2588) | 0x3ffb20f4 | 0x3ffb2b70 
2024-07-29 17:00:25,976 #         4 | recv_thread          | bl rx    _ |   6 |   2048 ( 1032) ( 1016) | 0x3ffb33e8 | 0x3ffb39f0 
2024-07-29 17:00:25,987 #         5 | recv thread          | bl rx    _ |   6 |   2048 ( 2044) (    4) | 0x3ffb0e60 | 0x3ffb1450 
2024-07-29 17:00:25,987 #           | SUM                  |            |     |  13368 ( 4980) ( 8388)
2024-07-29 17:00:25,987 # 
2024-07-29 17:00:25,987 # *** halted.
2024-07-29 17:00:25,987 # 
2024-07-29 17:00:26,009 # cpu/esp_common/syscalls.c:195 => cpu/esp_common/syscalls.c:195 => cpu/esp_common/syscalls.c:195 => cpu/esp_common/syscalls.c:195 => EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,009 # processes:
2024-07-29 17:00:26,020 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,020 # processes:
2024-07-29 17:00:26,031 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,031 # processes:
2024-07-29 17:00:26,032 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,032 # processes:
2024-07-29 17:00:26,042 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,043 # processes:
2024-07-29 17:00:26,053 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,054 # processes:
2024-07-29 17:00:26,054 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,064 # processes:
2024-07-29 17:00:26,065 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,065 # processes:
2024-07-29 17:00:26,076 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,076 # processes:
2024-07-29 17:00:26,078 # EXCEPTION!! exccause=2 (InstructionFetchErrorCause) @80001180 excvaddr=3ffb2f20
2024-07-29 17:00:26,078 # processes:
2024-07-29 17:00:29,929 # ets Jul 29 2019 12:21:46
2024-07-29 17:00:29,929 # 
2024-07-29 17:00:29,930 # rst:0x7 (TG0WDT_SYS_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
2024-07-29 17:00:29,930 # configsip: 0, SPIWP:0xee
2024-07-29 17:00:29,940 # clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
2024-07-29 17:00:29,941 # mode:DOUT, clock div:2
2024-07-29 17:00:29,941 # load:0x3fff0030,len:1428
2024-07-29 17:00:29,947 # load:0x40078000,len:13020
2024-07-29 17:00:29,947 # load:0x40080400,len:4
2024-07-29 17:00:29,947 # load:0x40080404,len:2960
2024-07-29 17:00:29,947 # entry 0x40080410
2024-07-29 17:00:30,010 # Pro cpu up.
2024-07-29 17:00:30,010 # Single core mode
2024-07-29 17:00:30,065 # 
2024-07-29 17:00:30,111 # main(): This is RIOT! (Version: 2024.10-devel-17-gd2fa0)
2024-07-29 17:00:30,111 # All up, running the shell now
> 

Versions

output of make print-versions


Operating System Environment
----------------------------
         Operating System: "Ubuntu" "22.04.4 LTS (Jammy Jellyfish)"
                   Kernel: Linux 5.15.146.1-microsoft-standard-WSL2 x86_64 x86_64
             System shell: /usr/bin/dash (probably dash)
             make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (Arm GNU Toolchain 13.2.rel1 (Build arm-13.7)) 13.2.1 20231009
                  avr-gcc: missing
           msp430-elf-gcc: missing
       riscv-none-elf-gcc: missing
  riscv64-unknown-elf-gcc: missing
      riscv32-esp-elf-gcc: riscv32-esp-elf-gcc (crosstool-NG esp-12.2.0_20230208) 12.2.0
     xtensa-esp32-elf-gcc: xtensa-esp32-elf-gcc (crosstool-NG esp-12.2.0_20230208) 12.2.0
   xtensa-esp32s2-elf-gcc: xtensa-esp32s2-elf-gcc (crosstool-NG esp-12.2.0_20230208) 12.2.0
   xtensa-esp32s3-elf-gcc: xtensa-esp32s3-elf-gcc (crosstool-NG esp-12.2.0_20230208) 12.2.0
   xtensa-esp8266-elf-gcc: missing
                    clang: Ubuntu clang version 14.0.0-1ubuntu1.1

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "4.3.0"
        msp430-elf-newlib: missing
    riscv-none-elf-newlib: missing
riscv64-unknown-elf-newlib: missing
   riscv32-esp-elf-newlib: "4.1.0"
  xtensa-esp32-elf-newlib: "4.1.0"
xtensa-esp32s2-elf-newlib: "4.1.0"
xtensa-esp32s3-elf-newlib: "4.1.0"
xtensa-esp8266-elf-newlib: missing
                 avr-libc: missing (missing)

Installed development tools
---------------------------
                   ccache: ccache version 4.5.1
                    cmake: cmake version 3.22.1
                 cppcheck: Cppcheck 2.7
                  doxygen: 1.9.1
                      git: git version 2.34.1
                     make: GNU Make 4.3
                  openocd: Open On-Chip Debugger v0.12.0-esp32-20230313 (2023-03-13-09:07)
                   python: missing
                  python2: missing
                  python3: Python 3.10.12
                   flake8: 4.0.1 (mccabe: 0.6.1, pycodestyle: 2.8.0, pyflakes: 2.4.0) CPython 3.10.12 on
               coccinelle: spatch version 1.1.1 compiled with OCaml version 4.13.1
FlapKap commented 1 month ago

Investigating further the probable cause is the radio not initializing, but this never being propagated. Enabling a bunch of debug flags reveals the following:

Type '/exit' to exit.
2024-07-30 15:07:02,360 # Pro cpu up.
2024-07-30 15:07:02,360 # Single core mode
2024-07-30 15:07:02,361 # 
2024-07-30 15:07:02,361 # [semtech-loramac] initializing loramac
2024-07-30 15:07:02,361 # [sx127x] netdev: initializing driver...
2024-07-30 15:07:02,361 # [sx127x] SPI_0 initialized with success
2024-07-30 15:07:02,362 # [sx127x] sx1276 test failed, invalid version number: 0
2024-07-30 15:07:02,362 # [sx127x] error: no valid device found
2024-07-30 15:07:02,362 # [sx127x] netdev: initialization failed
2024-07-30 15:07:02,362 # [semtech-loramac] radio: failed to initialize radio
2024-07-30 15:07:02,362 # [semtech-loramac] radio: initialization successful
2024-07-30 15:07:02,363 # [semtech-loramac] set dr 0
2024-07-30 15:07:02,363 # [semtech-loramac] set adr 0
2024-07-30 15:07:02,363 # [semtech-loramac] set public network 1
2024-07-30 15:07:02,363 # [semtech-loramac] set class 0
2024-07-30 15:07:02,363 # main(): This is RIOT! (Version: 2024.10-devel-25-gfe3a4)
2024-07-30 15:07:02,363 # All up, running the shell now

So this might indeed just be a problem with my board. The error stops propagating when it hits SX127XInit in pkg/semtech-loramac/contrib/semtech_loramac_radio.c:45

void SX127XInit(RadioEvents_t *events)
{
    (void) events;
    assert(loramac_netdev_ptr);
    if (loramac_netdev_ptr->driver->init(loramac_netdev_ptr) < 0) {
        DEBUG("[semtech-loramac] radio: failed to initialize radio\n");
    }

    DEBUG("[semtech-loramac] radio: initialization successful\n");
}

However since this function is being used as part of the Radio_s that defines is as void, I wouldn't know how to properly propagate this error further