espressif / esp-zigbee-sdk

Espressif Zigbee SDK
Apache License 2.0
178 stars 31 forks source link

Zigbee stack assertion failed common/zb_bufpool_mult.c:1183 (TZ-729) #304

Closed nomis closed 5 months ago

nomis commented 8 months ago

Answers checklist.

IDF version.

v5.3-dev-2320-ge4f167df25

esp-zigbee-lib version.

1.2.1

esp-zboss-lib version.

1.2.1

Espressif SoC revision.

ESP32-C6

What is the expected behavior?

Does not crash

What is the actual behavior?

Zigbee stack assertion failed common/zb_bufpool_mult.c:1183

Steps to reproduce.

Unknown

More Information.

Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: candle-dribbler.elf.gz Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)

Zigbee stack assertion failed common/zb_bufpool_mult.c:1183

abort() was called at PC 0x4202dde3 on core 0
Core  0 register dump:
MEPC    : 0x40800774  RA      : 0x40808c2c  SP      : 0x4082d690  GP      : 0x40811690
TP      : 0x4082d890  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130
S0/FP   : 0x4082d6bc  S1      : 0x4082d6bc  A0      : 0x4082d6c8  A1      : 0x4082d6aa
A2      : 0x00000000  A3      : 0x4082d6f5  A4      : 0x00000001  A5      : 0x4081f000
A6      : 0x00000000  A7      : 0x76757473  S2      : 0x00000098  S3      : 0x0000000d
S4      : 0x00000008  S5      : 0x0000000d  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938
MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000
MHARTID : 0x00000000

Stack memory:
4082d690: 0x00000000 0x00000000 0x4082d6a8 0x4080fcba 0x00000000 0x0000000d 0x00000030 0x40811dec
4082d6b0: 0x4082d6bc 0x40811e08 0x4082d6a8 0x32303234 0x33656464 0x00000000 0x726f6261 0x20292874
4082d6d0: 0x20736177 0x6c6c6163 0x61206465 0x43502074 0x34783020 0x64323032 0x20336564 0x63206e6f
4082d6f0: 0x2065726f 0x00000030 0x00000000 0x64f7aefa 0x0000002f 0x0000049f 0x4209ab06 0x4202dde6
4082d710: 0x1d25d205 0x00000008 0x40815304 0x4202cf24 0x4082d780 0x00000000 0x408153a3 0x4081b898
4082d730: 0x40815310 0x408153a4 0x4081d418 0x4202d0d4 0x00000000 0x00000095 0x4081d418 0x420604ce
4082d750: 0x00000000 0x0000004c 0x0000098f 0x00000000 0x00000000 0x4082d780 0x4082628c 0x4206142e
4082d770: 0x00000000 0x00000733 0x00000001 0x42060cda 0x00000010 0x00000000 0x00000000 0x00000000
4082d790: 0x00000000 0x00000000 0x00000004 0x420608f0 0x00000001 0x00000000 0x000cdcf8 0x42031eaa
4082d7b0: 0x00000000 0x00000000 0x022c02c2 0x42032b16 0x00000000 0x00000000 0x022c02c2 0x42032f4c
4082d7d0: 0x00000000 0xffffffff 0x40825f04 0x40809bc4 0x00000000 0x00000000 0x00000000 0x00000000
4082d7f0: 0x00000000 0x00000000 0x4082b850 0x42050042 0x00000040 0x00000000 0x4082632c 0x420217f6
4082d810: 0x00000000 0x00000000 0x00000000 0x42010456 0x00000000 0x00000000 0x00000000 0x42076d4a
4082d830: 0x00000000 0x00000000 0x00000000 0x4082b380 0x00000000 0x00000000 0x4082b850 0x42004e22
4082d850: 0x00000000 0x00000000 0x00000000 0x40809e1c 0x00000000 0x00000000 0x00000000 0x00000000
4082d870: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
4082d890: 0xa5a5a5a5 0x0000015c 0x4082d640 0x0356090d 0x40812e90 0x40812e90 0x4082d898 0x40812e88
4082d8b0: 0x00000014 0x40827504 0x40827504 0x4082d898 0x00000000 0x00000005 0x4082b894 0x6267697a
4082d8d0: 0x6d5f6565 0x006e6961 0x00000000 0x4082d890 0x00000005 0x00000000 0x00000005 0x00000001
4082d8f0: 0x00000000 0x00000000 0x25837c76 0x00000009 0x40820168 0x408201d0 0x40820238 0x00000000
4082d910: 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 0x42004166 0x00000000 0x00000000
4082d930: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d950: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d970: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d990: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d9b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d9d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4082d9f0: 0x00000000 0x00000028 0x00000000 0x4081f684 0x0000000e 0x0000002b 0x00000000 0x4082da48
4082da10: 0x4082da68 0x4082da68 0x00000000 0x00000000 0x00000020 0x00000000 0x4082fe28 0x00000000
4082da30: 0x00000000 0x2a0bebd0 0x0001fec7 0x4082644c 0x4082643c 0x00000020 0x00000000 0x00000abe
4082da50: 0x0029f630 0x00000000 0x0000ff00 0x000000fa 0x0003d090 0x00000000 0x0000005c 0x00000000
4082da70: 0x4082da6c 0x00000000 0x00000000 0x00000000 0x4082da84 0xffffffff 0x4082da84 0x4082da84
===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x4082d898, name: 'zigbee_main', GDB name: 'process 1082316952'
Crashed task is not in the interrupt context
Panic reason: abort() was called at PC 0x4202dde3 on core 0

================== CURRENT THREAD REGISTERS ===================
ra             0x40808c2c   0x40808c2c <__ubsan_include>
sp             0x4082d690   0x4082d690
gp             0x40811690   0x40811690 <_impure_data+136>
tp             0x4082d890   0x4082d890
t0             0x37363534   926299444
t1             0x7271706f   1920036975
t2             0x33323130   858927408
fp             0x4082d6bc   0x4082d6bc
s1             0x4082d6bc   1082316476
a0             0x4082d6c8   1082316488
a1             0x4082d6aa   1082316458
a2             0x0  0
a3             0x4082d6f5   1082316533
a4             0x1  1
a5             0x4081f000   1082257408
a6             0x0  0
a7             0x76757473   1987409011
s2             0x98 152
s3             0xd  13
s4             0x8  8
s5             0xd  13
s6             0x0  0
s7             0x0  0
s8             0x0  0
s9             0x0  0
s10            0x0  0
s11            0x0  0
t3             0x6e6d6c6b   1852664939
t4             0x6a696867   1785292903
t5             0x66656463   1717920867
t6             0x62613938   1650538808
pc             0x40800774   0x40800774 <panic_abort+18>

==================== CURRENT THREAD STACK =====================
#0  panic_abort (details=details@entry=0x4082d6c8 "abort() was called at PC 0x4202dde3 on core 0") at /home/simon/src/esp-idf/components/esp_system/panic.c:469
#1  0x40808c2c in esp_system_abort (details=details@entry=0x4082d6c8 "abort() was called at PC 0x4202dde3 on core 0") at /home/simon/src/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2  0x4080fcba in abort () at /home/simon/src/esp-idf/components/newlib/abort.c:38
#3  0x4202dde6 in ?? ()

======================== THREADS INFO =========================
  Id   Target Id          Frame 
* 1    process 1082316952 panic_abort (details=details@entry=0x4082d6c8 "abort() was called at PC 0x4202dde3 on core 0") at /home/simon/src/esp-idf/components/esp_system/panic.c:469
  2    process 1082276268 0x40806032 in esp_cpu_wait_for_intr () at /home/simon/src/esp-idf/components/esp_hw_support/cpu.c:150
  3    process 1082326024 0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
  4    process 1082278672 0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
  5    process 1082341940 vPortClearInterruptMaskFromISR (prev_int_level=1) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:496
  6    process 1082335352 0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
  7    process 1082330792 0x40809fe0 in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
  8    process 1082262892 vPortClearInterruptMaskFromISR (prev_int_level=1) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:496

       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x4082d898      zigbee_main      5/5         592/7596
0x408239ac             IDLE      0/0         208/1320
0x4082fc08      zigbee_task      5/5         528/7660
0x40824310          Tmr Svc      1/1         272/1764
0x40833a34          ui_uart      1/1         464/5664
0x40832078          ui_main      2/2         336/3756
0x40830ea8      device_main    19/19         336/3756
0x4082056c        esp_timer    22/22         240/3840

==================== THREAD 1 (TCB: 0x4082d898, name: 'zigbee_main') =====================
#0  panic_abort (details=details@entry=0x4082d6c8 "abort() was called at PC 0x4202dde3 on core 0") at /home/simon/src/esp-idf/components/esp_system/panic.c:469
#1  0x40808c2c in esp_system_abort (details=details@entry=0x4082d6c8 "abort() was called at PC 0x4202dde3 on core 0") at /home/simon/src/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2  0x4080fcba in abort () at /home/simon/src/esp-idf/components/newlib/abort.c:38
#3  0x4202dde6 in ?? ()

==================== THREAD 2 (TCB: 0x408239ac, name: 'IDLE') =====================
#0  0x40806032 in esp_cpu_wait_for_intr () at /home/simon/src/esp-idf/components/esp_hw_support/cpu.c:150
#1  0x42069286 in esp_vApplicationIdleHook () at /home/simon/src/esp-idf/components/esp_system/freertos_hooks.c:58
#2  0x4080adb0 in prvIdleTask (pvParameters=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4310
#3  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 3 (TCB: 0x4082fc08, name: 'zigbee_task') =====================
#0  0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
#1  vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:604
#2  0x40809af4 in xQueueSemaphoreTake (xQueue=0x4082fad0, xTicksToWait=<optimized out>, xTicksToWait@entry=6001) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:1901
#3  0x42005140 in pthread_cond_timedwait (mut=0x40825bb8, to=<optimized out>, cv=<optimized out>) at /home/simon/src/esp-idf/components/pthread/pthread_cond_var.c:163
#4  0x42010f28 in __gthread_cond_timedwait (__abs_timeout=0x4082fb60, __mutex=<optimized out>, __cond=0x40825bbc) at /home/simon/.espressif/tools/riscv32-esp-elf/esp-13.2.0_20230928/riscv32-esp-elf/riscv32-esp-elf/include/c++/13.2.0/riscv32-esp-elf/rv32imac_zicsr_zifencei/ilp32/no-rtti/bits/gthr-default.h:872
#5  std::__condvar::wait_until (__abs_time=..., __m=..., this=0x40825bbc) at /home/simon/.espressif/tools/riscv32-esp-elf/esp-13.2.0_20230928/riscv32-esp-elf/riscv32-esp-elf/include/c++/13.2.0/bits/std_mutex.h:178
#6  std::condition_variable::__wait_until_impl<std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > > (__lock=..., __atime=<synthetic pointer>..., this=0x40825bbc) at /home/simon/.espressif/tools/riscv32-esp-elf/esp-13.2.0_20230928/riscv32-esp-elf/riscv32-esp-elf/include/c++/13.2.0/condition_variable:224
#7  std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > > (__atime=..., __lock=..., this=0x40825bbc) at /home/simon/.espressif/tools/riscv32-esp-elf/esp-13.2.0_20230928/riscv32-esp-elf/riscv32-esp-elf/include/c++/13.2.0/condition_variable:137
#8  nutt::ZigbeeDevice::run_tasks (this=0x40825bb8) at /home/simon/build/candle-dribbler-0.7.2/src/zigbee.cpp:157
#9  0x42076d4a in std::execute_native_thread_routine (__p=<optimized out>) at /builds/idf/crosstool-NG/.build/riscv32-esp-elf/src/gcc/libstdc++-v3/src/c++11/thread.cc:104
#10 0x42004e22 in pthread_task_func (arg=0x4082dbc0, arg@entry=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/pthread/pthread.c:222
#11 0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 4 (TCB: 0x40824310, name: 'Tmr Svc') =====================
#0  0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
#1  vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:604
#2  0x4080a95a in prvProcessTimerOrBlockTask (xListWasEmpty=<optimized out>, xNextExpireTime=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:739
#3  prvTimerTask (pvParameters=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:685
#4  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 5 (TCB: 0x40833a34, name: 'ui_uart') =====================
#0  vPortClearInterruptMaskFromISR (prev_int_level=1) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:496
#1  vPortExitCritical () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:597
#2  0x408081de in prvReceiveGeneric (pxRingbuffer=0x40825688, pvItem1=pvItem1@entry=0x4083392c, pvItem2=pvItem2@entry=0x0, xItemSize1=xItemSize1@entry=0x4083394c, xItemSize2=xItemSize2@entry=0x0, xMaxSize=xMaxSize@entry=1, xTicksToWait=<optimized out>, xTicksToWait@entry=4294967295) at /home/simon/src/esp-idf/components/esp_ringbuf/ringbuf.c:876
#3  0x40808622 in xRingbufferReceiveUpTo (xRingbuffer=<optimized out>, pxItemSize=pxItemSize@entry=0x4083394c, xTicksToWait=xTicksToWait@entry=4294967295, xMaxSize=xMaxSize@entry=1) at /home/simon/src/esp-idf/components/esp_ringbuf/ringbuf.c:1178
#4  0x42008fdc in uart_read_bytes (uart_num=UART_NUM_0, buf=0x4083398c, length=1, ticks_to_wait=4294967295) at /home/simon/src/esp-idf/components/esp_driver_uart/src/uart.c:1404
#5  0x4200eb3e in __GNU_EH_FRAME_HDR ()
#6  0x42076d4a in std::execute_native_thread_routine (__p=<optimized out>) at /builds/idf/crosstool-NG/.build/riscv32-esp-elf/src/gcc/libstdc++-v3/src/c++11/thread.cc:104
#7  0x42004e22 in pthread_task_func (arg=0x408321ec, arg@entry=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/pthread/pthread.c:222
#8  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 6 (TCB: 0x40832078, name: 'ui_main') =====================
#0  0x40809fdc in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
#1  vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:604
#2  0x40809af4 in xQueueSemaphoreTake (xQueue=0x40824e20, xTicksToWait=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:1901
#3  0x4201a088 in nutt::WakeupThread::run_loop (this=0x40824d9c) at /home/simon/build/candle-dribbler-0.7.2/src/thread.cpp:64
#4  0x42076d4a in std::execute_native_thread_routine (__p=<optimized out>) at /builds/idf/crosstool-NG/.build/riscv32-esp-elf/src/gcc/libstdc++-v3/src/c++11/thread.cc:104
#5  0x42004e22 in pthread_task_func (arg=0x40831030, arg@entry=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/pthread/pthread.c:222
#6  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 7 (TCB: 0x40830ea8, name: 'device_main') =====================
#0  0x40809fe0 in vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:622
#1  vPortYield () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:604
#2  0x40809af4 in xQueueSemaphoreTake (xQueue=0x40825b34, xTicksToWait=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:1901
#3  0x4201a088 in nutt::WakeupThread::run_loop (this=0x408259b0) at /home/simon/build/candle-dribbler-0.7.2/src/thread.cpp:64
#4  0x42076d4a in std::execute_native_thread_routine (__p=<optimized out>) at /builds/idf/crosstool-NG/.build/riscv32-esp-elf/src/gcc/libstdc++-v3/src/c++11/thread.cc:104
#5  0x42004e22 in pthread_task_func (arg=0x4082fe60, arg@entry=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/pthread/pthread.c:222
#6  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

==================== THREAD 8 (TCB: 0x4082056c, name: 'esp_timer') =====================
#0  vPortClearInterruptMaskFromISR (prev_int_level=1) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:496
#1  vPortExitCritical () at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:597
#2  0x4080c49e in ulTaskGenericNotifyTake (uxIndexToWait=uxIndexToWait@entry=0, xClearCountOnExit=xClearCountOnExit@entry=1, xTicksToWait=xTicksToWait@entry=4294967295) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5722
#3  0x4206bf62 in timer_task (arg=<error reading variable: value has been optimized out>) at /home/simon/src/esp-idf/components/esp_timer/src/esp_timer.c:477
#4  0x40809e1c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/simon/src/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:229

======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x50000000 0x0 RW  
.rtc.force_fast 0x50000000 0x0 RW  
.rtc.force_slow 0x50000010 0x0 RW  
.iram0.text 0x40800000 0x10e88 R XA
.iram0.text_end 0x40810e88 0x0 RW  
.iram0.bss 0x40810e90 0x0 RW  
.dram0.data 0x40810e90 0x1d84 RW A
.flash.text 0x42000020 0x80908 R XA
.flash.appdesc 0x42090020 0x100 R  A
.flash.rodata 0x42090120 0x1afe0 RW A
.flash.tls 0x420ab100 0x0 RW  
.eh_frame 0x420ab100 0x10d8 R  A
.flash.rodata_noload 0x420ac1d8 0x0 RW  
.coredump.tasks.data 0x4082d898 0x15c RW 
.coredump.tasks.data 0x4082d5f0 0x2a0 RW 
.coredump.tasks.data 0x408239ac 0x15c RW 
.coredump.tasks.data 0x408238d0 0xd0 RW 
.coredump.tasks.data 0x4082fc08 0x15c RW 
.coredump.tasks.data 0x4082f9f0 0x210 RW 
.coredump.tasks.data 0x40824310 0x15c RW 
.coredump.tasks.data 0x408241f0 0x110 RW 
.coredump.tasks.data 0x40833a34 0x15c RW 
.coredump.tasks.data 0x40833850 0x1d0 RW 
.coredump.tasks.data 0x40832078 0x15c RW 
.coredump.tasks.data 0x40831f20 0x150 RW 
.coredump.tasks.data 0x40830ea8 0x15c RW 
.coredump.tasks.data 0x40830d50 0x150 RW 
.coredump.tasks.data 0x4082056c 0x15c RW 
.coredump.tasks.data 0x40821e10 0xf0 RW 

===================== ESP32 CORE DUMP END =====================
===============================================================
4202cf24: zb_bufpool_mult.c.obj:? <zb_get_buf_tail_ptr>
4202dde6: zb_init_common.c.obj:? <zb_assert>
42031eaa: zb_scheduler.c.obj:? <sched_is_cb_q_empty>
42032b16: zb_scheduler.c.obj:? <zb_sched_mac_transport_iteration>
nomis commented 8 months ago

The last communication was about 3 minutes before the crash:

2024-04-02 17:28:53.050 DEBUG (MainThread) [zigpy.zcl] [0x78C0:11:0x0006] Attribute report received: on_off=<Bool.false: 0>

2024-04-02 17:28:53.157 DEBUG (MainThread) [zigpy.zcl] [0x78C0:61:0x000f] Attribute report received: present_value=<Bool.false: 0>

2024-04-02 17:28:56.381 DEBUG (MainThread) [zigpy.zcl] [0x78C0:1:0x000c] Attribute report received: present_value=6.453306198120117

2024-04-02 17:29:57.461 DEBUG (MainThread) [zigpy.zcl] [0x78C0:1:0x000c] Attribute report received: status_flags=<bitmap8.0: 0>

2024-04-02 17:31:42.396 DEBUG (MainThread) [zigpy.zcl] [0x78C0:34:0x0006] Sending request: Read_Attributes(attribute_ids=[0])
2024-04-02 17:31:42.484 DEBUG (MainThread) [zigpy.zcl] [0x78C0:34:0x0006] Decoded ZCL frame: OnOff:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0000, status=<Status.SUCCESS: 0>, value=TypeValue(type=Bool, value=<Bool.false: 0>))])
xieqinan commented 8 months ago

@nomis ,

Is it possible to enable the trace log mode for debugging this issue? The trace mode will generate more log for you.

nomis commented 8 months ago

You're going to need to elaborate on which trace mode and logging build options should be used.

When I tried this previously all I got was so much output to the UART that no lower priority task could run, which I could work around but it implies that there's going to be more output than the maximum speed of my UART receiver at 115200bps.

nomis commented 8 months ago

Could you not instrument the library to provide more detail on the current state when the error occurs?

wisemanny commented 8 months ago

@nomis, I have the same panic during call to esp_zb_scheduler_alarm, as a workaround I rolled back to prev version of library (in idf_component.yml):

espressif/esp-zboss-lib: "<=1.2.1"

xieqinan commented 8 months ago

@nomis ,

Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: core-dump-2024-04-02.txt Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)

Please also provide the .elf file.

nomis commented 8 months ago

@nomis ,

Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: core-dump-2024-04-02.txt Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)

Please also provide the .elf file.

I've fixed the attachment in the description: candle-dribbler.elf.gz

xieqinan commented 7 months ago

@nomis ,

We've debugged some detailed issues based on the information you provided and made some fixes, but I cannot guarantee they are the root cause of this issue. The fixes will be released in the next version. Before that, could you replace this function the code below with the belowed code and test again?

static void ieee802154_receive_done(uint8_t *data, esp_ieee802154_frame_info_t *frame_info)
{
    // If the RX done packet is written in the stub buffer, drop it silently.
    if (s_rx_index == CONFIG_IEEE802154_RX_BUFFER_SIZE) {
        esp_rom_printf("receive buffer full, drop the current frame.\n");
    } else {
        // Otherwise, post it to the upper layer.
        frame_info->process = true;
        if (data[0] > 127) {
            esp_rom_printf("ovesized frame: %d\n", data[0]);
        }
        esp_ieee802154_receive_done(data, frame_info);
    }
}
xieqinan commented 7 months ago

By the way, enabling the debug mode is helpful for this issue. Could you also add the esp_zb_set_trace_level_mask(ESP_ZB_TRACE_LEVEL_CRITICAL, 0); function before esp_zb_init(&zb_nwk_cfg); to reproduce the issue and provide us with the result?

xieqinan commented 7 months ago

@nomis ,

Could you please test the issue again using the latest version of the esp-zigbee-sdk? We have made some updates that may be helpful in resolving it.

nomis commented 7 months ago

Ok, but you really need to stop making changes where it loses configuration on upgrade:

2024-04-12 12:17:24.561647 ZBOSS: common/zb_nvram.c:3189   ds_ver == 2
2024-04-12 12:17:24.561700 ZBOSS: common/zb_nvram.c:3072   ERROR: unexpected binding dataset length; size from nvram 576, expected size 576
2024-04-12 12:17:24.577870 ZBOSS: common/zb_nvram.c:3237   ERROR: could not read the APS binding table from NVRAM, page 1 pos 1328, ds_ver 2
nomis commented 7 months ago

Do you still want me to call esp_zb_set_trace_level_mask(ESP_ZB_TRACE_LEVEL_CRITICAL, 0); or should that be omitted? It looks like there's a bit more debug output without it on the latest version.

nomis commented 7 months ago

Source: https://github.com/nomis/candle-dribbler/tree/0.7.3 Binary: candle-dribbler.elf.gz

This has been crashing all day:

...
2024-04-12 12:39:18.087437 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.087485 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109236 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109278 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109295 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.112293 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:21.650268 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:21.650318 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:37.938296 
2024-04-12 12:39:37.938358 assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:487 (s_ieee802154_state == IEEE802154_STATE_RX)
2024-04-12 12:39:37.938374 Core  0 register dump:
2024-04-12 12:39:37.938388 MEPC    : 0x40800774  RA      : 0x40808bb2  SP      : 0x40813690  GP      : 0x40811680  
2024-04-12 12:39:37.960270 TP      : 0x408335f0  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130  
2024-04-12 12:39:37.960322 S0/FP   : 0x00000001  S1      : 0x00000060  A0      : 0x408136e4  A1      : 0x40811d01  
2024-04-12 12:39:37.960338 A2      : 0x00000001  A3      : 0x00000029  A4      : 0x00000001  A5      : 0x4081a000  
2024-04-12 12:39:37.982319 A6      : 0x00000010  A7      : 0x76757473  S2      : 0x408136d8  S3      : 0x40813817  
2024-04-12 12:39:37.982373 S4      : 0x40811d00  S5      : 0x408136d8  S6      : 0x420dd2b4  S7      : 0x00001000  
2024-04-12 12:39:37.982403 S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
2024-04-12 12:39:38.004324 T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938  
2024-04-12 12:39:38.004375 MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000  
2024-04-12 12:39:38.004392 MHARTID : 0x00000000  
2024-04-12 12:39:38.004406 
2024-04-12 12:39:38.004420 Stack memory:
2024-04-12 12:39:38.026270 40813690: 0x00000000 0x00000000 0x420e6758 0x4080fdf4 0x00000000 0x00000000 0x00000000 0x00373834
2024-04-12 12:39:38.026324 408136b0: 0x00000000 0x40811e0c 0x420e6758 0x40811e1c 0x420dcff7 0x40811e20 0x408136ac 0x40811e24
2024-04-12 12:39:38.026341 408136d0: 0x420dd2b4 0x40811d00 0x00000000 0x00000000 0x00000000 0x65737361 0x66207472 0x656c6961
2024-04-12 12:39:38.049332 408136f0: 0x69203a64 0x685f7273 0x6c646e61 0x78725f65 0x6f62615f 0x65207472 0x695f7073 0x38656565
2024-04-12 12:39:38.049388 40813710: 0x35313230 0x65645f34 0x3a632e76 0x20373834 0x695f7328 0x38656565 0x35313230 0x74735f34
2024-04-12 12:39:38.071357 40813730: 0x20657461 0x49203d3d 0x38454545 0x35313230 0x54535f34 0x5f455441 0x00295852 0x40803fb6
2024-04-12 12:39:38.071407 40813750: 0x00000000 0x00000000 0x40828c44 0x408041f8 0x00000020 0x00000000 0x00000000 0x00000001
2024-04-12 12:39:38.071422 40813770: 0x00000020 0x4081b000 0x00000500 0x00000020 0x00000001 0x00000000 0x4082acdc 0x420a0a18
2024-04-12 12:39:38.093351 40813790: 0x00000000 0x00000001 0x00000004 0x03000001 0x00000000 0x00000026 0x4081b000 0xfbd6ee9e
2024-04-12 12:39:38.093400 408137b0: 0x40818356 0x4081a000 0x00000000 0x00001000 0x0001e000 0x00000000 0x00010000 0x40817030
2024-04-12 12:39:38.115289 408137d0: 0x4081b000 0x600a3000 0x00000010 0x4080ec7e 0x0001e000 0x00000000 0x00010000 0x00000001
2024-04-12 12:39:38.115344 408137f0: 0x00001881 0x80000010 0x40811a14 0x408001e8 0x6000a000 0x408073d0 0x408073dc 0x00000000
2024-04-12 12:39:38.115361 40813810: 0x4081380c 0x00000000 0x00000000 0x00000000 0x40813824 0xffffffff 0x40813824 0x40813824
2024-04-12 12:39:38.138280 40813830: 0x00000000 0x40813838 0xffffffff 0x40813838 0x40813838 0x00000001 0x00000001 0x00000000
2024-04-12 12:39:38.138329 40813850: 0x0001ffff 0x00000000 0x00000000 0x00000004 0x00000000 0x00000000 0x00000000 0x40813868
2024-04-12 12:39:38.160457 40813870: 0x00000000 0x00000000 0x00000000 0x40813880 0xffffffff 0x40813880 0x40813880 0x00000000
2024-04-12 12:39:38.160507 40813890: 0x40813894 0xffffffff 0x40813894 0x40813894 0x00000001 0x00000001 0x00000000 0x0001ffff
2024-04-12 12:39:38.160522 408138b0: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x6000a000 0x408073d0 0x408073dc
2024-04-12 12:39:38.182375 408138d0: 0x40820854 0x00000000 0x00000000 0x4081b240 0x4081b304 0x4081b3c8 0x40826a78 0x00000000
2024-04-12 12:39:38.182454 408138f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.182471 40813910: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.204406 40813930: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.204456 40813950: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226337 40813970: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226389 40813990: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226405 408139b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.249296 408139d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.249349 408139f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271438 40813a10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271489 40813a30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271504 40813a50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.293353 40813a70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.293408 
2024-04-12 12:39:38.293427 
2024-04-12 12:39:38.293442 
2024-04-12 12:39:38.293456 ELF file SHA256: 02fdc2544

It wrote a core dump:

2024-04-12 12:39:39.333419 D (8066) esp_core_dump_elf: ================ Processing coredump info ================
2024-04-12 12:39:39.333433 D (8074) esp_core_dump_elf: Application SHA256='02fdc2544', length=9.
2024-04-12 12:39:39.355372 D (8080) esp_core_dump_elf: Add note size=104, start_off=0x2d54
2024-04-12 12:39:39.355424 D (8086) esp_core_dump_elf: Add note size=36, start_off=0x2d54
2024-04-12 12:39:39.355440 D (8091) esp_core_dump_elf: Core dump note data needs 1 bytes padding
2024-04-12 12:39:39.355454 D (8097) esp_core_dump_elf: Add note size=136, start_off=0x2d54
2024-04-12 12:39:39.377540 D (8103) esp_core_dump_elf: Add NOTE segment, size=276, start_off=0x2d54
2024-04-12 12:39:39.377592 D (8109) esp_core_dump_elf: =========== Data written size = 11904 bytes ==========
2024-04-12 12:39:39.377607 D (8116) esp_core_dump_crc: Total length of hashed data: 11904
2024-04-12 12:39:39.400379 I (8122) esp_core_dump_flash: Write end offset 0x2e84, check sum length 4
2024-04-12 12:39:39.400431 I (8129) esp_core_dump_common: Core dump used 888 bytes on stack. 476 bytes left free.
2024-04-12 12:39:39.400448 I (8136) esp_core_dump_common: Restoring stack @ 0x40813540
2024-04-12 12:39:39.400463 I (8141) esp_core_dump_flash: Core dump has been saved to flash.

But then it's apparently invalid so I can't read it back:

2024-04-12 12:39:39.489545 I (65) esp_core_dump_flash: Init core dump to flash
2024-04-12 12:39:39.489597 I (66) esp_core_dump_flash: Found partition 'coredump' @ 40000 262144 bytes
2024-04-12 12:39:39.489614 D (77) esp_core_dump_crc: Total length of hashed data: 11904
2024-04-12 12:39:39.489627 E (77) esp_core_dump_flash: Core dump data check failed:
2024-04-12 12:39:39.493498 Calculated checksum='2c9a2b2f'
2024-04-12 12:39:39.493542 Image checksum='00f3106d'

It then crashes with an invalid read, repeatedly:

2024-04-12 13:02:40.964725 W (900553) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=12 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:41.018235 W (900603) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=22 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:41.034033 W (900623) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 13:02:41.745172 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.745221 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767280 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767334 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767370 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.770177 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:43.538896 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:43.538948 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:44.271842 W (903853) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=32 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:44.428478 W (904023) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 13:02:44.898276 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:44.898328 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:47.767787 W (907353) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=62 cluster_id=15 cluster_role=1 attr_id=85 manuf_code=0
2024-04-12 13:02:47.829205 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:02:47.829260 
2024-04-12 13:02:47.829277 Core  0 register dump:
2024-04-12 13:02:47.829291 MEPC    : 0x4206b988  RA      : 0x4206b988  SP      : 0x408334a0  GP      : 0x40811680  
2024-04-12 13:02:47.852163 TP      : 0x408335f0  T0      : 0x40022494  T1      : 0xffffffe0  T2      : 0x0000003f  
2024-04-12 13:02:47.852209 S0/FP   : 0x4082cc88  S1      : 0x0000003c  A0      : 0x00000000  A1      : 0x00000055  
2024-04-12 13:02:47.852225 A2      : 0x00000000  A3      : 0x00000002  A4      : 0x00000000  A5      : 0x00000001  
2024-04-12 13:02:47.874300 A6      : 0x00000001  A7      : 0x00000000  S2      : 0x00000001  S3      : 0x4082cc50  
2024-04-12 13:02:47.874350 S4      : 0x00000000  S5      : 0x00000017  S6      : 0x00000000  S7      : 0x00000001  
2024-04-12 13:02:47.874366 S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
2024-04-12 13:02:47.896392 T3      : 0x00000000  T4      : 0x00000000  T5      : 0x40826f5c  T6      : 0x00000001  
2024-04-12 13:02:47.896445 MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000005  MTVAL   : 0x00000003  
2024-04-12 13:02:47.896461 MHARTID : 0x00000000  
2024-04-12 13:02:47.896476 
2024-04-12 13:02:47.896489 Stack memory:
2024-04-12 13:02:47.918160 408334a0: 0x0000ffff 0x420b5000 0x4082cc50 0x4082217f 0x0000004f 0x4082cc50 0x4082cbb2 0x42061de8
2024-04-12 13:02:47.918229 408334c0: 0x0000ffff 0x420b5000 0x4082cc50 0x42013cde 0x00000055 0x00000000 0x00000042 0x00000000
2024-04-12 13:02:47.918253 408334e0: 0x00000000 0x00000001 0x00000000 0x0000000f 0x00000000 0x00000000 0x4082cc50 0x4206b1d2
2024-04-12 13:02:47.940312 40833500: 0xc67078c0 0x0000004a 0x013c6ae7 0x00000000 0x00000001 0x00000000 0x00000000 0x4203e866
2024-04-12 13:02:47.940362 40833520: 0x00000000 0x420a08d2 0x00000000 0x00000000 0x4206afb4 0xff00000d 0x40826894 0x40809b4a
2024-04-12 13:02:47.963296 40833540: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x40827be4 0x42070fb6
2024-04-12 13:02:47.963348 40833560: 0x00000000 0x00000000 0x00000000 0x4201e1e8 0x00000000 0x00000000 0x00000000 0x42010456
2024-04-12 13:02:47.963364 40833580: 0x00000000 0x00000000 0x00000000 0x420a44b0 0x00000000 0x00000000 0x00000000 0x40827064
2024-04-12 13:02:47.985194 408335a0: 0x00000000 0x00000000 0x40827be4 0x42004e22 0x00000000 0x00000000 0x00000000 0x40809da2
2024-04-12 13:02:47.985251 408335c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.007195 408335e0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x0000015c 0x408331d0
2024-04-12 13:02:48.007244 40833600: 0x00016269 0x40812e40 0x40812e40 0x408335fc 0x40812e38 0x00000014 0x4082730c 0x4082730c
2024-04-12 13:02:48.007261 40833620: 0x408335fc 0x00000000 0x00000005 0x408315f8 0x6267697a 0x6d5f6565 0x006e6961 0x00000000
2024-04-12 13:02:48.029283 40833640: 0x408335f0 0x00000005 0x00000000 0x00000005 0x00000001 0x00000000 0x00000000 0x00d813bd
2024-04-12 13:02:48.029333 40833660: 0x00000009 0x4081b5b8 0x4081b620 0x4081b688 0x00000000 0x00000000 0x00000001 0x00000000
2024-04-12 13:02:48.029348 40833680: 0x00000000 0x00000000 0x42004166 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.052173 408336a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.052225 408336c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074213 408336e0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074265 40833700: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074285 40833720: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.096284 40833740: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00002000 0xa5a5a5a5
2024-04-12 13:02:48.096345 40833760: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118248 40833780: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118318 408337a0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118335 408337c0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140273 408337e0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140325 40833800: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140340 40833820: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.163280 40833840: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.163333 40833860: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.179461 40833880: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.179513 
2024-04-12 13:02:48.179529 
2024-04-12 13:02:48.179542 
2024-04-12 13:02:48.179556 ELF file SHA256: 02fdc2544
...
2024-04-12 13:17:49.090457 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:32:50.357563 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:47:51.626877 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:02:52.894122 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:17:54.167366 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:32:55.443358 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:47:56.634586 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:02:57.910721 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:17:59.194935 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:33:00.484131 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:48:01.769565 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:03:03.084696 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:18:04.382838 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:33:05.680234 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:48:06.984327 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:03:08.292488 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:18:09.612808 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:33:10.926938 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:48:12.242206 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:03:16.579405 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:18:20.806673 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.

This is an extremely annoying problem that you keep creating on new versions of your library, where it can't handle data being in an older format or a different size and so it initialises the zboss library with invalid data. You need to stop doing this.

I have a workaround in place to handle zb_zcl_get_attr_desc_manuf_a() returning NULL which you can see in the logs but it's not enough to fix whatever this new bug is.

2024-04-12 18:33:02.436735 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.436788 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459748 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459800 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459817 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.462389 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:21.737462 W (900562) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=12 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.786043 W (900602) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=22 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.841192 W (900662) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=32 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.897626 W (900722) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=62 cluster_id=15 cluster_role=1 attr_id=85 manuf_code=0
2024-04-12 18:33:21.958793 Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:33:21.958846 
2024-04-12 18:33:21.958863 Core  0 register dump:
2024-04-12 18:33:21.958877 MEPC    : 0x4206b988  RA      : 0x4206b988  SP      : 0x408334a0  GP      : 0x40811680  
2024-04-12 18:33:21.981837 TP      : 0x408335f0  T0      : 0x40022494  T1      : 0xffffffe0  T2      : 0x0000003f  
2024-04-12 18:33:21.981895 S0/FP   : 0x4082cc88  S1      : 0x0000003c  A0      : 0x00000000  A1      : 0x00000055  
2024-04-12 18:33:21.981911 A2      : 0x00000000  A3      : 0x00000002  A4      : 0x00000000  A5      : 0x00000001  
2024-04-12 18:33:22.003749 A6      : 0x00000001  A7      : 0x00000000  S2      : 0x00000001  S3      : 0x4082cc50  
2024-04-12 18:33:22.003801 S4      : 0x00000000  S5      : 0x00000017  S6      : 0x00000000  S7      : 0x00000001  
2024-04-12 18:33:22.003817 S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
2024-04-12 18:33:22.025862 T3      : 0x00000000  T4      : 0x00000000  T5      : 0x40826f5c  T6      : 0x00000001  
2024-04-12 18:33:22.025914 MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000005  MTVAL   : 0x00000003  
2024-04-12 18:33:22.025931 MHARTID : 0x00000000  
2024-04-12 18:33:22.025945 
2024-04-12 18:33:22.025959 Stack memory:
2024-04-12 18:33:22.047888 408334a0: 0x0000ffff 0x420b5000 0x4082cc50 0x4082217f 0x0000004f 0x4082cc50 0x4082cbb2 0x42061de8
2024-04-12 18:33:22.047941 408334c0: 0x0000ffff 0x420b5000 0x4082cc50 0x42013cde 0x00000055 0x00000000 0x00000042 0x00000000
2024-04-12 18:33:22.047957 408334e0: 0x00000000 0x00000001 0x00000000 0x0000000f 0x00000000 0x00000000 0x4082cc50 0x4206b1d2
2024-04-12 18:33:22.070885 40833500: 0xcb8978c0 0x0000004f 0x00293ec7 0x00000000 0x00000001 0x00000000 0x00000000 0x4203e866
2024-04-12 18:33:22.070938 40833520: 0x00000000 0x420a08d2 0x00000000 0x00000000 0x4206afb4 0xff000008 0x40826894 0x40809b4a
2024-04-12 18:33:22.092676 40833540: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x40827be4 0x42070fb6
2024-04-12 18:33:22.092735 40833560: 0x00000000 0x00000000 0x00000000 0x4201e1e8 0x00000000 0x00000000 0x00000000 0x42010456
2024-04-12 18:33:22.092752 40833580: 0x00000000 0x00000000 0x00000000 0x420a44b0 0x00000000 0x00000000 0x00000000 0x40827064
2024-04-12 18:33:22.114792 408335a0: 0x00000000 0x00000000 0x40827be4 0x42004e22 0x00000000 0x00000000 0x00000000 0x40809da2
2024-04-12 18:33:22.114840 408335c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.136857 408335e0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x0000015c 0x40833430
2024-04-12 18:33:22.136903 40833600: 0x00015fd0 0x40812e40 0x40812e40 0x408335fc 0x40812e38 0x00000014 0x4082730c 0x4082730c
2024-04-12 18:33:22.136918 40833620: 0x408335fc 0x00000000 0x00000005 0x408315f8 0x6267697a 0x6d5f6565 0x006e6961 0x00000000
2024-04-12 18:33:22.158778 40833640: 0x408335f0 0x00000005 0x00000000 0x00000005 0x00000001 0x00000000 0x00000000 0x00d3e58a
2024-04-12 18:33:22.158831 40833660: 0x00000009 0x4081b5b8 0x4081b620 0x4081b688 0x00000000 0x00000000 0x00000001 0x00000000
2024-04-12 18:33:22.158848 40833680: 0x00000000 0x00000000 0x42004166 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.181732 408336a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.181788 408336c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203884 408336e0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203936 40833700: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203952 40833720: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.225861 40833740: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00002000 0xa5a5a5a5
2024-04-12 18:33:22.225908 40833760: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247812 40833780: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247864 408337a0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247880 408337c0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270849 408337e0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270902 40833800: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270918 40833820: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.292870 40833840: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.292922 40833860: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.314789 40833880: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.314840 
2024-04-12 18:33:22.314856 
2024-04-12 18:33:22.314869 
2024-04-12 18:33:22.314882 ELF file SHA256: 02fdc2544

Core dump: core-dump-2024-04-12.txt

================== CURRENT THREAD REGISTERS ===================
ra             0x4206b988   0x4206b988
sp             0x408334a0   0x408334a0
gp             0x40811680   0x40811680 <_impure_data+120>
tp             0x408335f0   0x408335f0
t0             0x40022494   1073882260
t1             0xffffffe0   -32
t2             0x3f 63
fp             0x4082cc88   0x4082cc88
s1             0x3c 60
a0             0x0  0
a1             0x55 85
a2             0x0  0
a3             0x2  2
a4             0x0  0
a5             0x1  1
a6             0x1  1
a7             0x0  0
s2             0x1  1
s3             0x4082cc50   1082313808
s4             0x0  0
s5             0x17 23
s6             0x0  0
s7             0x1  1
s8             0x0  0
s9             0x0  0
s10            0x0  0
s11            0x0  0
t3             0x0  0
t4             0x0  0
t5             0x40826f5c   1082290012
t6             0x1  1
pc             0x4206b988   0x4206b988

==================== CURRENT THREAD STACK =====================
#0  0x4206b988 in ?? ()

This is in zb_zcl_get_next_reporting_info() which has just called zb_zcl_get_attr_desc_manuf_a() with manuf_code of 0 (a4) and received NULL back (a0) so it crashes when it dereferences it:

4206b970:       03645703                lhu     a4,54(s0)
4206b974:       00645683                lhu     a3,6(s0)
4206b978:       00444603                lbu     a2,4(s0)
4206b97c:       00245583                lhu     a1,2(s0)
4206b980:       00144503                lbu     a0,1(s0)
4206b984:       cd9f40ef                jal     4206065c <zb_zcl_get_attr_desc_manuf_a>
4206b988:       00354783                lbu     a5,3(a0)
nomis commented 7 months ago

The first error happened again trying to rejoin the network after erasing the flash:

2024-04-12 18:53:53.957550 D (541) nutt.ZigbeeDevice: Zigbee stack initialized
2024-04-12 18:53:53.957605 I (541) nutt.ZigbeeDevice: Device address: 40:4c:ca:ff:fe:43:0b:d0/0xffff (network not configured)
2024-04-12 18:53:53.957621 I (541) nutt.ZigbeeDevice: Waiting for pairing button press (ZDO Skip Start Up)
2024-04-12 18:53:53.979518 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:53.979570 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:53.979586 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.001622 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.001676 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.005711 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.005759 D (591) nutt: Free stack: 1208
2024-04-12 18:53:56.763092 D (3361) nutt.ZigbeeDevice: Connecting (join network)
2024-04-12 18:54:00.004792 I (6601) nutt.ZigbeeDevice: Joined network successfully (6f:06:fc:4a:5b:92:dd:9c/0x8813@26 as 0x78c0)
2024-04-12 18:54:00.702054 ZBOSS: nwk/nwk_nlme.c:794   OOM state is detected. ACHTUNG: Out of memory!
2024-04-12 18:54:00.702109 ZBOSS: common/zb_bufpool_mult.c:1607   OOM! bufs allocated {38 i-o 8} max 80
2024-04-12 18:54:00.702126 ZBOSS: nwk/nwk_nlme.c:836   ACHTUNG: Sending OOM status
2024-04-12 18:54:00.702141 ZBOSS: nwk/nwk_nlme.c:857   OOM state ends
2024-04-12 18:54:01.689753 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:01.689805 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:04.799229 ZBOSS: nwk/zb_nwk_ed_aging.c:227   ED aging is not supported
2024-04-12 18:54:09.430295 W (16031) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 18:54:44.769226 I (51371) nutt.ZigbeeDevice: No bindings
2024-04-12 18:54:50.233597 
2024-04-12 18:54:50.233651 assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:487 (s_ieee802154_state == IEEE802154_STATE_RX)
2024-04-12 18:54:50.233667 Core  0 register dump:
2024-04-12 18:54:50.233680 MEPC    : 0x40800774  RA      : 0x40808bb2  SP      : 0x40813690  GP      : 0x40811680  
2024-04-12 18:54:50.255602 TP      : 0x40833570  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130  
2024-04-12 18:54:50.255651 S0/FP   : 0x00000001  S1      : 0x00000060  A0      : 0x408136e4  A1      : 0x40811d01  
2024-04-12 18:54:50.255667 A2      : 0x00000001  A3      : 0x00000029  A4      : 0x00000001  A5      : 0x4081a000  
2024-04-12 18:54:50.277583 A6      : 0x00000010  A7      : 0x76757473  S2      : 0x408136d8  S3      : 0x40813817  
2024-04-12 18:54:50.277638 S4      : 0x40811d00  S5      : 0x408136d8  S6      : 0x420dd2b4  S7      : 0x00001000  
2024-04-12 18:54:50.277655 S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
2024-04-12 18:54:50.299544 T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938  
2024-04-12 18:54:50.299598 MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000  
2024-04-12 18:54:50.299616 MHARTID : 0x00000000  
2024-04-12 18:54:50.299630 
2024-04-12 18:54:50.299645 Stack memory:
2024-04-12 18:54:50.322471 40813690: 0x00000000 0x00000000 0x420e6758 0x4080fdf4 0x00000000 0x00000000 0x00000000 0x00373834
2024-04-12 18:54:50.322523 408136b0: 0x00000000 0x40811e0c 0x420e6758 0x40811e1c 0x420dcff7 0x40811e20 0x408136ac 0x40811e24
2024-04-12 18:54:50.322540 408136d0: 0x420dd2b4 0x40811d00 0x00000000 0x00000000 0x00000000 0x65737361 0x66207472 0x656c6961
2024-04-12 18:54:50.344548 408136f0: 0x69203a64 0x685f7273 0x6c646e61 0x78725f65 0x6f62615f 0x65207472 0x695f7073 0x38656565
2024-04-12 18:54:50.344601 40813710: 0x35313230 0x65645f34 0x3a632e76 0x20373834 0x695f7328 0x38656565 0x35313230 0x74735f34
2024-04-12 18:54:50.366488 40813730: 0x20657461 0x49203d3d 0x38454545 0x35313230 0x54535f34 0x5f455441 0x00295852 0x40803fb6
2024-04-12 18:54:50.366541 40813750: 0x40820850 0x00000000 0x40828760 0x408041f8 0x00000020 0x00000000 0x00000000 0x00000001
2024-04-12 18:54:50.366557 40813770: 0x00000020 0x4081b000 0x00000780 0x00000020 0x00000001 0x00000000 0x4082ec28 0x420a0a18
2024-04-12 18:54:50.388551 40813790: 0x00000000 0x00000001 0x00000004 0x03000001 0x00000000 0x0000004f 0x4081b000 0x2955d241
2024-04-12 18:54:50.388604 408137b0: 0x408197ff 0x4081a000 0x00000000 0x00001000 0x0001b000 0x00001000 0x00010000 0x40817030
2024-04-12 18:54:50.410450 408137d0: 0x4081b000 0x600a3000 0x00000010 0x4080ec7e 0x0001b000 0x00001000 0x00010000 0x00000001
2024-04-12 18:54:50.410502 408137f0: 0x00001881 0x80000010 0x40811a14 0x408001e8 0x6000a000 0x408073d0 0x408073dc 0x00000000
2024-04-12 18:54:50.410519 40813810: 0x4081380c 0x00000000 0x00000000 0x00000000 0x40813824 0xffffffff 0x40813824 0x40813824
2024-04-12 18:54:50.433499 40813830: 0x00000000 0x40813838 0xffffffff 0x40813838 0x40813838 0x00000001 0x00000001 0x00000000
2024-04-12 18:54:50.433551 40813850: 0x0001ffff 0x00000000 0x00000000 0x00000004 0x00000000 0x00000000 0x00000000 0x40813868
2024-04-12 18:54:50.455536 40813870: 0x00000000 0x00000000 0x00000000 0x40813880 0xffffffff 0x40813880 0x40813880 0x00000000
2024-04-12 18:54:50.455594 40813890: 0x40813894 0xffffffff 0x40813894 0x40813894 0x00000001 0x00000001 0x00000000 0x0001ffff
2024-04-12 18:54:50.455611 408138b0: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x6000a000 0x408073d0 0x408073dc
2024-04-12 18:54:50.477519 408138d0: 0x408207d0 0x00000000 0x00000000 0x4081b240 0x4081b304 0x4081b3c8 0x408269f4 0x00000000
2024-04-12 18:54:50.477571 408138f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.477588 40813910: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.499520 40813930: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.499574 40813950: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522521 40813970: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522574 40813990: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522591 408139b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.544627 408139d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.544678 408139f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566602 40813a10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566652 40813a30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566674 40813a50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.588599 40813a70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.588646 
2024-04-12 18:54:50.588662 
2024-04-12 18:54:50.588675 
2024-04-12 18:54:50.588688 ELF file SHA256: 02fdc2544

Again, I can't access the core dump:

2024-04-12 18:57:19.379179 D (4076) esp_core_dump_crc: Total length of hashed data: 11968
2024-04-12 18:57:19.379253 E (4077) esp_core_dump_flash: Core dump data check failed:
2024-04-12 18:57:19.379269 Calculated checksum='84c71dfa'
2024-04-12 18:57:19.379283 Image checksum='d67d921d'
2024-04-12 18:57:19.379296 I (147672) nutt.Device: No core dump: 265

It's then running out of receive buffer space while binding all the endpoint clusters:

2024-04-12 18:54:52.320464 D (612) nutt.ZigbeeDevice: Zigbee stack initialized
2024-04-12 18:54:52.320517 I (612) nutt.ZigbeeDevice: Device address: 40:4c:ca:ff:fe:43:0b:d0/0x78c0 (network configured)
2024-04-12 18:54:52.320534 D (612) nutt.ZigbeeDevice: Connecting (ZDO Skip Start Up)
2024-04-12 18:54:52.342465 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.342521 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.342538 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364644 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364696 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364712 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.366368 D (662) nutt: Free stack: 1208
2024-04-12 18:54:54.650843 ZBOSS: nwk/nwk_join.c:186   low lqi 0
2024-04-12 18:54:54.683200 D (2992) nutt.ZigbeeDevice: Connecting (BDB Device Reboot)
2024-04-12 18:54:54.692954 I (2992) nutt.ZigbeeDevice: Joined network successfully (6f:06:fc:4a:5b:92:dd:9c/0x8813@26 as 0x78c0)
2024-04-12 18:54:55.241313 E (3542) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.313738 E (3592) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.332153 E (3632) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.964687 W (4272) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 18:54:56.413691 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:56.413746 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:55:00.402921 ZBOSS: nwk/zb_nwk_ed_aging.c:227   ED aging is not supported
chshu commented 7 months ago

@nomis From your previous shared info, the assert location indicates the device receives a 802.15.4 packet longer than 127 bytes, but it's not suppose to happen since our hardware filter should drop such packet. We added a double filter in the software in the 1.2.3 release.

If the issue still happens, we assume it's caused by some overflow issue, the buffer was overwritten due to some unexpected operation. Could you please share us the steps to reproduce the issue, we will debug it locally.

nomis commented 7 months ago

Could you please share us the steps to reproduce the issue, we will debug it locally.

I don't have a way to reproduce it. I just run it on my Zigbee network with ZHA and wait for it to crash.

The last communication was about 3 minutes before the crash

That's not strictly true... I was looking at a log filtered for communication with the device (0x78C0, via router 0x97F6). I hadn't looked at the log for communication with other devices. There was a request/response with another device (0xEB07, direct) at the time but it's not an abnormal message:

2024-04-02 17:34:30.858 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=False), tsn=119, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
2024-04-02 17:34:30.858 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Sending request: Read_Attributes(attribute_ids=[4])
2024-04-02 17:34:30.859 DEBUG (MainThread) [bellows.zigbee.application] Sending packet ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xEB07), dst_ep=1, source_route=[], extended_timeout=False, tsn=119, profile_id=260, cluster_id=0, data=Serialized[b'\x00w\x00\x04\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=None, rssi=None)
2024-04-02 17:34:30.883 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 60167, EmberApsFrame(profileId=260, clusterId=0, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=7), 120, <EmberStatus.SUCCESS: 0>, b'']
2024-04-02 17:34:30.887654 Zigbee stack assertion failed common/zb_bufpool_mult.c:1183
2024-04-02 17:34:30.890 DEBUG (MainThread) [bellows.zigbee.application] Received incomingRouteRecordHandler frame with [0xeb07, 00:12:4b:00:24:c5:95:a7, 104, -74, []]
2024-04-02 17:34:30.890 DEBUG (MainThread) [bellows.zigbee.application] Processing route record request: (0xeb07, 00:12:4b:00:24:c5:95:a7, 104, -74, [])
2024-04-02 17:34:30.901 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=0, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=127), 104, -74, 0xeb07, 255, 255, b'\x18w\x01\x04\x00\x00B\x06SONOFF']
2024-04-02 17:34:30.901 DEBUG (MainThread) [zigpy.application] Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xEB07), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=1, source_route=None, extended_timeout=False, tsn=127, profile_id=260, cluster_id=0, data=Serialized[b'\x18w\x01\x04\x00\x00B\x06SONOFF'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=104, rssi=-74)
2024-04-02 17:34:30.901 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Received ZCL frame: b'\x18w\x01\x04\x00\x00B\x06SONOFF'
2024-04-02 17:34:30.902 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=119, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
2024-04-02 17:34:30.902 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Decoded ZCL frame: Basic:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0004, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='SONOFF'))])
xieqinan commented 7 months ago

@nomis ,

Ok, but you really need to stop making changes where it loses configuration on upgrade:

Regarding the issue, we have marked it in the release notes.

Source: https://github.com/nomis/candle-dribbler/tree/0.7.3 Binary: candle-dribbler.elf.gz

This has been crashing all day:

The crashing may be triggered by too many stack logs. Could you use the esp-zigbee-sdk v1.2.3, which includes optimizations for this issue, to test the problem in release mode?

nomis commented 7 months ago

Ok, but you really need to stop making changes where it loses configuration on upgrade: Regarding the issue, we have marked it in the release notes.

No, you have not done this. You've noted that some sizes are configurable, but not that there are critical bugs in the library when the sizes change that require the zb_storage partition to be erased to fix.

The crashing may be triggered by too many stack logs. Could you use the esp-zigbee-sdk v1.2.3, which includes optimizations for this issue, to test the problem in release mode?

The crashing is caused by the aforementioned bug where the zboss library calls zb_zcl_get_attr_desc_manuf_a() with rep_info->manuf_code of 0 instead of ZB_ZCL_MANUFACTURER_WILDCARD_ID because the configuration is not initialised correctly. I still have an open bug #86 for this when endpoints are changed, but it happens every time you change some aspect of the configuration format/size too.

nomis commented 7 months ago

We've debugged some detailed issues based on the information you provided and made some fixes, but I cannot guarantee they are the root cause of this issue. The fixes will be released in the next version. Before that, could you replace this function the code below with the belowed code and test again?

This has happened once with 1.2.3 and it didn't result in a crash:

2024-04-20 01:42:50.956834 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:42:50.957000 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:42:51.885534 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:42:51.885690 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:42:52.096456 W (10806022) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-20 01:45:00.529011 ovesized frame: 158
2024-04-20 01:47:51.018197 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:47:51.018351 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:51:51.061893 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:51:51.062046 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:55:51.101273 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1

Of course, that doesn't prove that this was the cause if you've changed 1.2.3 to silently filter those messages out... I'll need to re-test with 1.2.1 until it happens again.

zwx1995esp commented 7 months ago

Hi, @nomis After discussing with the HW design team, we confirm that our ESP board can not receive any packets which the length is larger than 127. So we doubt that there might be some out-of-bound memory issues. The Rx frame buffer memory might be written by accident.

So, could you please help us to debug this issue? Let's set a watchpoint to monitor the Rx frame buffer.

First set the 154 rx frame buffer to 4 image And then add these codes:

esp_cpu_set_watchpoint(0, s_rx_frame[0], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(1, s_rx_frame[1], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(2, s_rx_frame[2], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(3, s_rx_frame[3], 4, ESP_CPU_WATCHPOINT_STORE);

to the end of the function ieee802154_mac_init

If the watchpoint triggered, you will get a crash log, please share the crash logs to us.

nomis commented 7 months ago

Ok. I will also have the changes for #326 applied.

I've previously been using IEEE802154_RX_BUFFER_SIZE=100 (because I have a lot of endpoints/clusters) and IEEE802154_TIMING_OPTIMIZATION=y.

zwx1995esp commented 7 months ago

Ok. I will also have the changes for #326 applied.

I've previously been using IEEE802154_RX_BUFFER_SIZE=100 (because I have a lot of endpoints/clusters) and IEEE802154_TIMING_OPTIMIZATION=y.

Ok,Refer to 154 Spec, its transmission speed is 250Kbps(32us for 1byte), maybe 1ms for receiving a 30-length packet. This time is enough for the SW to process this packet. Based our test in the high-traffice scenario, 20 is sufficient, so no need for using the size 100 for this buffer.

The reason why I told you to set the buffer size 4 is that C6 only support 4 watchpoints at the same time.

nomis commented 7 months ago

There's no documentation for IEEE802154 in the TRM, so I can't check how dma_rx_addr is supposed to work but can receive happen into the buffer at any time?

What would happen if the dma_rx_addr is set while a packet is being received - could a packet be split across two buffers or the middle of a packet be written to the start of a buffer? Either of those could result in a packet with the wrong length.

There's an inconsistency in the order of IEEE802154_SET_TXRX_PTI() and set_next_rx_buffer() here:

esp_err_t ieee802154_receive_at(uint32_t time)
{
...
    IEEE802154_SET_TXRX_PTI(IEEE802154_SCENE_RX_AT);
    set_next_rx_buffer();

Everything else sets the next buffer first.

(I'm not doing Zigbee and WiFi co-existence but it looks like the handling for it can't be disabled.)

The reason why I told you to set the buffer size 4 is that C6 only support 4 watchpoints at the same time.

SOC_CPU_WATCHPOINT_MAX_REGION_SIZE is 0x80000000 on the ESP32-C6, so I could set a watchpoint on the entire s_rx_frame array, regardless of how many receive buffers are configured?

nomis commented 7 months ago

The ieee802154_receive_at() function also needs to call ieee802154_enter_critical() earlier. Otherwise it's calling stop_current_operation() (via rx_init()) and set_next_rx_buffer() with interrupts enabled.

This function isn't used by the Zigbee libraries.

zwx1995esp commented 7 months ago

The ieee802154_receive_at() function also needs to call ieee802154_enter_critical() earlier. Otherwise it's calling stop_current_operation() (via rx_init()) and set_next_rx_buffer() with interrupts enabled.

This function isn't used by the Zigbee libraries.

Thanks for catching this risk, this API is not used in Zigbee stack so this risk might not be related to this crash issue. For fixing this risk, we will fix it right now.

zwx1995esp commented 7 months ago

The ieee802154_receive_at() function also needs to call ieee802154_enter_critical() earlier. Otherwise it's calling stop_current_operation() (via rx_init()) and set_next_rx_buffer() with interrupts enabled.

This function isn't used by the Zigbee libraries.

Also welcome PRs to fix it :)

xmow49 commented 7 months ago

This is in zb_zcl_get_next_reporting_info() which has just called zb_zcl_get_attr_desc_manuf_a() with manuf_code of 0 (a4) and received NULL back (a0) so it crashes when it dereferences it:

Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.

An example of how the bug appears:

Can you fix this bug ?

Thanks


....
I (6446) ZIGBEE: Device started up in non factory-reset mode
I (6447) ZIGBEE: Device rebooted
Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.

Core  0 register dump:
MEPC    : 0x4205a8f8  RA      : 0x4205a8f4  SP      : 0x40841bf0  GP      : 0x4081e504
0x4205a8f8: zb_zcl_get_next_reporting_info at ??:?
0x4205a8f4: zb_zcl_get_next_reporting_info at ??:?

TP      : 0x407f886c  T0      : 0x40030dca  T1      : 0x0000000f  T2      : 0x62808861
0x40030dca: memset in ROM

S0/FP   : 0x40848f94  S1      : 0x00000004  A0      : 0x00000000  A1      : 0x00000020
A2      : 0x0000ffff  A3      : 0x0000000c  A4      : 0x4084888d  A5      : 0x00000005
A6      : 0x0000000c  A7      : 0x0000000c  S2      : 0x00000000  S3      : 0x40848eb4
S4      : 0x00000000  S5      : 0x00000006  S6      : 0x00000000  S7      : 0x00000052
S8      : 0x00000038  S9      : 0x0000ff42  S10     : 0x00000000  S11     : 0x00000007
T3      : 0x000000fd  T4      : 0x000000fe  T5      : 0xffffffff  T6      : 0x000000ff
MSTATUS : 0x00001881  MTVEC   : 0x40800001  MCAUSE  : 0x00000005  MTVAL   : 0x00000003
0x40800001: _vector_table at ??:?

MHARTID : 0x00000000

Stack memory:
40841bf0: 0x4205a224 0x00000017 0x0000004a 0x40848eec 0x40848eb4 0x40848a80 0x40842ec9 0x42054c5e
0x4205a224: zb_zcl_reporting_timer_handler at ??:?
0x42054c5e: zb_zcl_send_report_attr_command at ??:?

40841c10: 0x0000000f 0x40841c30 0x00000003 0x4206a394 0x0000000f 0x00000000 0x7fffffff 0x00000002
0x4206a394: zb_osif_scheduler_event at ??:?

40841c30: 0x0000000f 0x00000000 0x00000042 0x00000000 0x00000000 0x4205a3e8 0x00000006 0x00000000
0x4205a3e8: zb_zcl_report_attr at ??:?

40841c50: 0x00000002 0x00000052 0x40848eb4 0x4205a4d8 0x0000000f 0x4205a3e8 0x7fffffff 0x000000ff
0x4205a4d8: zb_zcl_report_attr at ??:?
0x4205a3e8: zb_zcl_report_attr at ??:?

40841c70: 0x0000000f 0x00000000 0x40828ba0 0x42044f14 0x0000007f 0x0000000f 0x40841dd8 0x00000000
0x42044f14: zb_sched_loop_iteration at ??:?

40841c90: 0x40848808 0x00000000 0x40846f48 0x40831000 0x00000000 0x0000ff42 0x00000b04 0x00000004
40841cb0: 0x0000007f 0x0000000f 0x40841dd8 0x40847aa0 0x40848808 0x408480fc 0x42158000 0x420364fe
0x420364fe: esp_zb_main_loop_iteration at ??:?

40841cd0: 0x42157ce8 0x00000394 0x00000007 0x42010b8a 0x00000025 0x40841dd8 0xa5a5a5a5 0xa5a5a5a5
0x42010b8a: zigbee_task at D:/08-LocalRepos/TICMeter/firmware/main/zigbee.c:635

40841cf0: 0xa5a5a5a5 0xa5a5a5a5 0x40847888 0x42150030 0x40847c08 0x40847d70 0x40847d04 0x40847a34
40841d10: 0x0000007f 0x0000ff42 0x0000002d 0x00000025 0xa5a5a5a5 0xa5a5a5a5 0xfb010101 0xa5a50008
40841d30: 0x00010000 0x000003e8 0x00000004 0x030205a0 0xa5a50040 0x00000000 0x00000000 0xa5a500fb
40841d50: 0x00000002 0x00000000 0x00000006 0x00000fa0 0x00030202 0x00c8ffff 0x00000000 0x00000000
40841d70: 0x00030202 0x00000000 0x00000000 0x00000000 0x2e335617 0x2d322e32 0x34672d31 0x30623832
40841d90: 0x642d3762 0x79747269 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5 0x32303213 0x34302d34 0x2038322d
40841db0: 0x333a3131 0x35303a37 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841dd0: 0xa5a5a5a5 0xa5a5a5a5 0x30000030 0x00003030 0x32333935 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5
40841df0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841e10: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841e30: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x53010401 0x00303200 0x33393500 0xa5000032 0xa5a5a5a5
40841e50: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x4201041c 0x00000000 0x00000000 0x4081e504
0x4201041c: zigbee_task at D:/08-LocalRepos/TICMeter/firmware/main/zigbee.c:319

40841e70: 0x407f886c 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841e90: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841eb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841ed0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841ef0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841f10: 0x00000150 0x40841af0 0x0000336d 0x408221cc 0x408221cc 0x40841f14 0x408221c4 0x00000014
40841f30: 0x40848834 0x40848834 0x40841f14 0x00000000 0x00000005 0x4083ff10 0x6267695a 0x6d5f6565
40841f50: 0x006e6961 0x00000000 0x40841f00 0x00000005 0x00000001 0x00000000 0x00000000 0x00000009
40841f70: 0x408327cc 0x40832834 0x4083289c 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000
40841f90: 0x00000000 0x42006116 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x42006116: esp_cleanup_r at C:/Espressif/frameworks/esp-idf-v5.2.1/components/newlib/newlib_init.c:60

40841fb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841fd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

ELF file SHA256: ff35324e1e01a5c8

Rebooting in  2 seconds...```
xieqinan commented 7 months ago

@xmow49 ,

Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.

Regarding another issue, I suspect it's related to the binding table size. By default, the binding table size is set to 16 in the esp-zigbee-sdk. Could you try adjusting the size by calling esp_zb_aps_src_binding_table_size_set() with your desired value? Keep in mind that this function needs to be called before esp_zb_init() for it to take effect.

xmow49 commented 7 months ago

@xmow49 ,

Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.

Regarding another issue, I suspect it's related to the binding table size. By default, the binding table size is set to 16 in the esp-zigbee-sdk. Could you try adjusting the size by calling esp_zb_aps_src_binding_table_size_set() with your desired value? Keep in mind that this function needs to be called before esp_zb_init() for it to take effect.

Many thanks, Im not able to reproduce the bug. I set the value to 64 and all works great ! Thanks

nomis commented 7 months ago

After discussing with the HW design team, we confirm that our ESP board can not receive any packets which the length is larger than 127. So we doubt that there might be some out-of-bound memory issues. The Rx frame buffer memory might be written by accident.

So, could you please help us to debug this issue? Let's set a watchpoint to monitor the Rx frame buffer.

This has finally logged a frame length over 127 bytes without a crash:

2024-05-02 12:40:55.056673 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-05-02 12:40:55.056724 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-05-02 12:41:10.723378 E (777449843) ieee802154 frame: ieee802154_frame_get_src_addr(339): invalid frame type
2024-05-02 12:41:10.723430 oversized frame: 0x40817921 150
2024-05-02 12:41:55.067166 ZBOSS: zcl/zcl_main.c:357   zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-05-02 12:41:55.067216 ZBOSS: zcl/zcl_main.c:382   zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1

I'll modify it to log the contents of s_rx_frame when this happens because I suspect it involves writing parts of a normal size frame across two buffers.

There are also some places in the code where the frame data is read without checking that the frame is it at least a certain minimum length too.

zwx1995esp commented 6 months ago

Hi @nomis , sorry for the delay, how many IEEE802154_RX_BUFFER_SIZE did you used? I see your code, and for double confirm that did you use the size 4 for IEEE802154_RX_BUFFER_SIZE?

For your final code:

    esp_cpu_set_watchpoint(0, s_rx_frame[0], 4, ESP_CPU_WATCHPOINT_STORE);
    esp_cpu_set_watchpoint(1, s_rx_frame[1], 4, ESP_CPU_WATCHPOINT_STORE);
    esp_cpu_set_watchpoint(2, s_rx_frame[2], 4, ESP_CPU_WATCHPOINT_STORE);
    esp_cpu_set_watchpoint(3, s_rx_frame[3], 4, ESP_CPU_WATCHPOINT_STORE);

The watchpoint can only monitor the length of the first 4 of s_rx_frame buffer...

nomis commented 6 months ago

Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.

zwx1995esp commented 6 months ago

Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.

Ok, so there will be no OOB issues for the Rx frame buffer in your programe based your test. I discuss this issue with the HW design team, and they said it will be a corner case. The HW will use the frame[0] & 0x7f as the frame length, and calculate the CRC from frame[1] to frame[frame[0] & 0x7f] to make sure the payload is received correctly. So just no check for the bit8 of the frame[0]. In some cases this bit will be wrong due to PHY decoding wrong under some noise env(very very low probability). We don't think it is a malignant bug, we will add some check in the driver to make sure the length of the packet is less than 127.

Thanks for your patience and hardwork on debugging this issue.

chshu commented 5 months ago

Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.

Ok, so there will be no OOB issues for the Rx frame buffer in your programe based your test. I discuss this issue with the HW design team, and they said it will be a corner case. The HW will use the frame[0] & 0x7f as the frame length, and calculate the CRC from frame[1] to frame[frame[0] & 0x7f] to make sure the payload is received correctly. So just no check for the bit8 of the frame[0]. In some cases this bit will be wrong due to PHY decoding wrong under some noise env(very very low probability). We don't think it is a malignant bug, we will add some check in the driver to make sure the length of the packet is less than 127.

Thanks for your patience and hardwork on debugging this issue.

It's indeed a corner case, the fix has been merged to IDF: https://github.com/espressif/esp-idf/commit/17eb2dcce33e65efb4b1fa98e3b9ed90aa9debff, and will be included in the future IDF releases. Closing this issue.

nomis commented 3 months ago

It has taken 76 days uptime but I've finally had another instance of an oversized frame.

The first rx buffer changed from: 444188BC1388FFFF92790912FCFF92790135E75EB826004B1200286AAB3700E75EB826004B12000034E25CD9F29517536E2D94C5A3A80FD3BDE9D95640DA577A092036B00B5D5A6BD0A38E94B809F86C51EF59E22E02795C2228B1827D404E673C167B6A885021DF33540DC5D56FE59F5AC92F79177D7BE2000000000000000000

to: EE47B266D8976D327B7D66D006B0C1E70E1401331237026004C214E2300000C0301E5227311361102E320700F00670101F4066A31E200041008C3177234E3027D61E010003600C0C6D000603270100EFB000154866016000026202C4394004F402011D6AC40007F2620752F3C993FC9F5AC92F79177D7BE2000000000000000000

The number of bytes overwritten is consistent with a 7-bit length of 0xAE (110 bytes) but the content looks like noise or a message from a different PAN.