espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.6k stars 7.27k forks source link

OpenThread Border Router example assert failed: sys_timeout /IDF/components/lwip/lwip/src/core/timeouts.c (IDFGH-10760) #11976

Closed DonaldFraser closed 8 months ago

DonaldFraser commented 1 year ago

Answers checklist.

IDF version.

v5.1

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

CMD

Development Kit.

ESP32-S2-DevKitM-1 v1.0

Power Supply used.

USB

What is the expected behavior?

Building and running the Espressif Thread Border Router example https://github.com/espressif/esp-idf/tree/master/examples/openthread/ot_br, I expect the Thread Border Router to run on the ESP-32 S2 devkit as described in the example with no assert failure causing. I have built and run an RCP device on an Espressif ESP32-H2 devkit and this RCP is connected to the ESP32 S2 running this thread border router example by UART as per the example page.

What is the actual behavior?

ESP32-S2 idf.py monitor indicates an assertion failure, causing a reboot of the ESP32-S2 thread border router assert failed: sys_timeout /IDF/components/lwip/lwip/src/core/timeouts.c:308 (Timeout time too long, max is LWIP_UINT32_MAX/4 msecs) This pattern repeats on reboot, leading to continual reboots of the ESP32-S2 running the thread border router example Looks similar to https://github.com/espressif/esp-idf/issues/10029, but with openthread, not BLE

Steps to reproduce.

  1. Follow the steps in Build,Flash, and Run section of https://github.com/espressif/esp-idf/tree/master/examples/openthread/ot_br Using a ESP-32 S2 devkit as the thread border router, connected by UART, as shown on the same page to an ESP-H2 devkit already flashed with the RCP client

Debug Logs.

Debug Log

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x3 (RTC_SW_SYS_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x4002558d
0x4002558d: esp_restart_noos_dig at D:/ESP/esp-idf-v5.1/components/esp_system/port/esp_system_chip.c:57 (discriminator 1)

SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe6108,len:0x17c8
load:0x4004b000,len:0x4
load:0x4004b004,len:0xab4
load:0x4004f000,len:0x3160
entry 0x4004b1c4
I (26) boot: ESP-IDF v5.1-dirty 2nd stage bootloader
I (26) boot: compile time Jul 27 2023 18:34:38
I (26) boot: chip revision: v0.0
I (30) boot.esp32s2: SPI Speed      : 80MHz
I (35) boot.esp32s2: SPI Mode       : DIO
I (39) boot.esp32s2: SPI Flash Size : 2MB
I (44) boot: Enabling RNG early entropy source...
I (50) boot: Partition Table:
I (53) boot: ## Label            Usage          Type ST Offset   Length
I (60) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (68) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (75) boot:  2 factory          factory app      00 00 00010000 00190000
I (83) boot:  3 ot_storage       Unknown data     01 3a 001a0000 00002000
I (90) boot: End of partition table
I (94) esp_image: segment 0: paddr=00010020 vaddr=3f000020 size=3c3a0h (246688) map
I (152) esp_image: segment 1: paddr=0004c3c8 vaddr=3ffc9af0 size=03914h ( 14612) load
I (156) esp_image: segment 2: paddr=0004fce4 vaddr=40024000 size=00334h (   820) load
I (158) esp_image: segment 3: paddr=00050020 vaddr=40080020 size=105c7ch (1072252) map
I (380) esp_image: segment 4: paddr=00155ca4 vaddr=40024334 size=157b4h ( 87988) load
I (414) boot: Loaded app from partition at offset 0x10000
I (414) boot: Disabling RNG early entropy source...
I (426) cpu_start: Unicore app
I (426) cache: Instruction cache        : size 8KB, 4Ways, cache line size 32Byte
I (428) cache: Data cache               : size 8KB, 4Ways, cache line size 32Byte
I (435) cpu_start: Pro cpu up.
I (450) cpu_start: Pro cpu start user code
I (450) cpu_start: cpu freq: 160000000 Hz
I (450) cpu_start: Application information:
I (455) cpu_start: Project name:     esp_ot_br
I (460) cpu_start: App version:      v5.1-dirty
I (465) cpu_start: Compile time:     Jul 27 2023 20:05:34
I (471) cpu_start: ELF file SHA256:  195e8c95fa07a484...
I (477) cpu_start: ESP-IDF:          v5.1-dirty
I (483) cpu_start: Min chip rev:     v0.0
I (487) cpu_start: Max chip rev:     v1.99
I (492) cpu_start: Chip rev:         v0.0
I (497) heap_init: Initializing. RAM available for dynamic allocation:
I (504) heap_init: At 3FFDCAA8 len 0001F558 (125 KiB): DRAM
I (510) heap_init: At 3FFFC000 len 00003A10 (14 KiB): DRAM
I (517) heap_init: At 3FF9E000 len 00002000 (8 KiB): RTCRAM
I (524) spi_flash: detected chip: generic
I (527) spi_flash: flash io: dio
W (531) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (549) app_start: Starting scheduler on CPU0
I (550) main_task: Started on CPU0
I (550) main_task: Calling app_main()
I (560) example_connect: Start example_connect.
I (580) wifi:wifi driver task: 3ffe5474, prio:23, stack:6656, core=0
I (580) wifi:wifi firmware version: b2f1f86
I (580) wifi:wifi certification version: v7.0
I (580) wifi:config NVS flash: enabled
I (580) wifi:config nano formating: disabled
I (590) wifi:Init data frame dynamic rx buffer num: 32
I (590) wifi:Init management frame dynamic rx buffer num: 32
I (600) wifi:Init management short buffer num: 32
I (600) wifi:Init dynamic tx buffer num: 32
I (610) wifi:Init static rx buffer size: 1600
I (610) wifi:Init static rx buffer num: 10
I (610) wifi:Init dynamic rx buffer num: 32
I (620) wifi_init: rx ba win: 6
I (620) wifi_init: tcpip mbox: 32
I (630) wifi_init: udp mbox: 6
I (630) wifi_init: tcp mbox: 6
I (630) wifi_init: tcp tx win: 5744
I (640) wifi_init: tcp rx win: 5744
I (640) wifi_init: tcp mss: 1440
I (650) wifi_init: WiFi IRAM OP enabled
I (650) wifi_init: WiFi RX IRAM OP enabled
I (660) phy_init: phy_version 2300,d67cf06,Feb 10 2022,10:03:07
I (700) wifi:mode : sta (84:f7:03:d8:87:3c)
I (700) wifi:enable tsf
I (700) example_connect: Connecting to BTWholeHome-TM9...
I (710) example_connect: Waiting for IP(s)
I (3120) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (3810) wifi:state: init -> auth (b0)
I (3810) wifi:state: auth -> assoc (0)
I (3840) wifi:state: assoc -> run (10)
I (3950) wifi:connected with BTWholeHome-TM9, aid = 7, channel 1, BW20, bssid = d4:86:60:d4:c4:89
I (3950) wifi:security: WPA2-PSK, phy: bgn, rssi: -53
I (3950) wifi:pm start, type: 1

I (4020) wifi:AP's beacon interval = 102400 us, DTIM period = 3
I (4950) esp_netif_handlers: example_netif_sta ip: 192.168.1.182, mask: 255.255.255.0, gw: 192.168.1.254
I (4950) example_connect: Got IPv4 event: Interface "example_netif_sta" address: 192.168.1.182
I (5560) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fe80:0000:0000:0000:86f7:03ff:fed8:873c, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (5560) example_common: Connected to example_netif_sta
I (5570) example_common: - IPv4 address: 192.168.1.182,
I (5570) example_common: - IPv6 address: fe80:0000:0000:0000:86f7:03ff:fed8:8I (5590) OPENTHREAD: spinel UART interface initialization completed
I (5590) main_task: Returned from app_main()
I(5600) OPENTHREAD:[I] Platform------: RCP reset: RESET_POWER_ON
I(5670) OPENTHREAD:[I] ChildSupervsn-: Timeout: 0 -> 190
I(5700) OPENTHREAD:[I] Settings------: Read NetworkInfo {rloc:0x5800, extaddr:5622a9a75874d4bb, role:leader, mode:0x0f, version:4, keyseq:0x0, ...
I(5710) OPENTHREAD:[I] Settings------: ... pid:0x7d28d8b7, mlecntr:0x2c7df9, maccntr:0x2c4bd8, mliid:57d757ec939fcbce}
I (5730) OPENTHREAD: OpenThread attached to netif
> I(5740) OPENTHREAD:[N] BorderRouter--: BR ULA prefix: fd0f:fe99:65f2::/48 (loaded)
I(5740) OPENTHREAD:[N] BorderRouter--: Local on-link prefix: fd5e:edfe:a96e:75ef::/64
I (5760) esp_ot_br: Already has network, skip configuring OpenThread network.
I (5770) OPENTHREAD: Platform UDP bound to port 49153
I(5770) OPENTHREAD:[N] Mle-----------: Role disabled -> detached
I (5800) OPENTHREAD: Platform UDP bound to port 61631
I (5800) OPENTHREAD: netif up
I (5800) OPENTHREAD: NAT64 ready
I (5860) OPENTHREAD: Received PIO
I (5860) OPENTHREAD: Received PIO
I (5860) OPENTHREAD: Received RIO
I (5870) OPENTHREAD: prefix FDAA:BBCC:DDEE:: lifetime 4294967295

I (5880) OPENTHREAD: Received RIO
I (5880) OPENTHREAD: prefix 2A00:23C5:53B8:E01:: lifetime 315360000

assert failed: sys_timeout /IDF/components/lwip/lwip/src/core/timeouts.c:308 (Timeout time too long, max is LWIP_UINT32_MAX/4 msecs)

Backtrace: 0x4002561a:0x3ffe1880 0x4002d521:0x3ffe18a0 0x400348e5:0x3ffe18c0 0x400a6d18:0x3ffe19e0 0x4014b38d:0x3ffe1a00 0x4014a2e3:0x3ffe1a30 0x400a1539:0x3ffe1af0 0x400abf19:0x3ffe1b10 0x400afb55:0x3ffe1b50 0x4009f691:0x3ffe1b70 0x4009f714:0x3ffe1b90 0x4002f969:0x3ffe1bc0
0x4002561a: panic_abort at D:/ESP/esp-idf-v5.1/components/esp_system/panic.c:452

0x4002d521: esp_system_abort at D:/ESP/esp-idf-v5.1/components/esp_system/port/esp_system_chip.c:84

0x400348e5: __assert_func at D:/ESP/esp-idf-v5.1/components/newlib/assert.c:81

0x400a6d18: sys_timeout at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/core/timeouts.c:308 (discriminator 1)

0x4014b38d: esp_openthread_route_table_add_route_entry at ??:?

0x4014a2e3: icmp6_raw_recv_handler at esp_openthread_infra_if.c.obj:?

0x400a1539: raw_input at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/core/raw.c:178

0x400abf19: ip6_input at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/core/ipv6/ip6.c:1055

0x400afb55: ethernet_input at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/netif/ethernet.c:229

0x4009f691: tcpip_thread_handle_msg at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/api/tcpip.c:174

0x4009f714: tcpip_thread at D:/ESP/esp-idf-v5.1/components/lwip/lwip/src/api/tcpip.c:148

0x4002f969: vPortTaskWrapper at D:/ESP/esp-idf-v5.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

ELF file SHA256: 195e8c95fa07a484

Rebooting...
ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x3 (RTC_SW_SYS_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x4002558d
0x4002558d: esp_restart_noos_dig at D:/ESP/esp-idf-v5.1/components/esp_system/port/esp_system_chip.c:57 (discriminator 1)

More Information.

No response

zwx1995esp commented 1 year ago

@DonaldFraser Hi, this RIO prefix 2A00:23C5:53B8:E01::, lifetime 315360000 is sent by one of your device? Notice, this timelift is not a reasonable time, due to the unit of this variable is second defined by RFC 4861. And time 315360000 is almost 36000 years, seems not a reasonable lifetime.

Also I will discuss about this with LWIP team, if the user set the timeout time too long, should there be an assert or just print some error logs and set the default max timeout instead.

DonaldFraser commented 1 year ago

Hi, I don't have any experience of RIO, only what I have read at https://openthread.io/reference/group/api-border-routing. I think you are suggesting that the thread border router example on my ESP32-S2 device is receiving an RIO network message externally - so probably from my home wifi router in this case, as that's the SSID I used in menuconfig for this example. Maybe I can configure my router to send RIO messages with a much shorter timeout, although it's a basic home router I don't know if I can do this. Maybe I can configure my router with IPv4 instead of IPv6. I will try experimenting with router settings.

Donald

DonaldFraser commented 1 year ago

I was unable to find any router settings on my home wifi router that allowed the thread border router to work. However I did have an old 4G wifi router lying around. I have set that up now, used idf.py menuconfig to change the openthread thread border router example config to use the SSID and password for my 4G router wifi network. The thread border router example running on my esp32-s2 with esp32-h2 RCP, is working with that wifi network, which is great, I can start to experiment and add some more ESP-32H2 devkit Thread end devices now. However it would be good to be able to get this working with my real home wifi network in future. I guess this issue is still relevant in the real world where RIO messages coming from wifi routers that people may have in their homes could specify very long lifetimes. As zwx1995esp commented, maybe logging an error and using the default max timeout would be more robust for real world wifi networks. I am able to proceed experimenting, using my other wifi router now, but would be good to have a solution for the future.

chshu commented 8 months ago

@DonaldFraser We have enchaned the implementation to check the lifetime value from the received RIO. So the issue should not happen anymore with the latest SDK. You can try it with IDF v5.1.2 release.