espressif / esp-zigbee-sdk

Espressif Zigbee SDK
Apache License 2.0
157 stars 25 forks source link

zb_assert() in zb_get_buf_tail_ptr() (TZ-670) #267

Closed nomis closed 6 months ago

nomis commented 7 months ago

Answers checklist.

IDF version.

v5.3-dev-422-ga7fbf452fa

esp-zigbee-lib version.

1.0.5

esp-zboss-lib version.

1.0.5

Espressif SoC revision.

ESP32-C6

What is the expected behavior?

Does not crash

What is the actual behavior?

zb_assert() was called in zb_get_buf_tail_ptr()

Steps to reproduce.

Unknown. This happened after 88.7 days uptime.

More Information.

Source: https://github.com/nomis/candle-dribbler/tree/0.5.2 Config: sdkconfig.txt Binary: candle-dribbler.elf.gz Core dump: core-dump-2024-02-23.txt

Output from riscv32-esp-elf-addr2line (#157):

42022df2: zb_bufpool_mult.c.obj:?
42027bf4: zb_scheduler.c.obj:?
420287e0: zb_scheduler.c.obj:?

Disassembly:

    42022d50 <zb_get_buf_tail_ptr>:
...
    42022dd0:       8082                    ret
    42022dd2:       48c00593                li      a1,1164
    42022dd6:       4208a537                lui     a0,0x4208a
    42022dda:       95850513                addi    a0,a0,-1704 # 42089958 <_flash_rodata_start+0x9838>
    42022dde:       699000ef                jal     ra,42023c76 <zb_assert>
    42022de2:       49f00593                li      a1,1183
    42022de6:       4208a537                lui     a0,0x4208a
    42022dea:       95850513                addi    a0,a0,-1704 # 42089958 <_flash_rodata_start+0x9838>
    42022dee:       689000ef                jal     ra,42023c76 <zb_assert> <--
--> 42022df2:       85a6                    mv      a1,s1
    42022df4:       854e                    mv      a0,s3
    42022df6:       e2dff0ef                jal     ra,42022c22 <zb_buf_alloc_right_func>
    42022dfa:       85a6                    mv      a1,s1
    42022dfc:       854e                    mv      a0,s3
    42022dfe:       d15ff0ef                jal     ra,42022b12 <zb_buf_cut_right_func>
    42022e02:       b76d                    j       42022dac <zb_get_buf_tail_ptr+0x5c>
kelin6 commented 7 months ago

@nomis please update esp-zigbee-sdk version to 1.2.0, and try to add zigbee lock before calling any Zigbee APIs, except that the call site is in Zigbee callbacks which are from Zigbee task. Please refer to zigbee-api-lock.

If the assert error still occurs, please follow these assertion-failures guide provide detailed information, including the sniffer capture file, logs, and the ELF file. The more detailed, the better. Thank you very much.

nomis commented 7 months ago

The only functions I call without locking are esp_zb_scheduler_alarm() and esp_zb_scheduler_alarm_cancel().

I can't possibly provide capture files for a bug that takes 89 days to occur, you need to identify why an assert might happen in this function with the version of the library that was used.

kelin6 commented 7 months ago

@nomis You can try adding zigbee lock to see if it helps. We will refer to your code and try to reproduce the same issue. Currently, based on the source code, we only know that the assert is caused by a buffer overflow during allocation. However, we haven't concluded under what circumstances this issue occurs.

nomis commented 6 months ago

You can try adding zigbee lock to see if it helps.

As determined by https://github.com/espressif/esp-zigbee-sdk/issues/275#issuecomment-1991490736, no locking is required for esp_zb_scheduler_alarm() and esp_zb_scheduler_alarm_cancel().

However, I re-implemented scheduling using a separate task and kept it that way because it makes it possible to distinguish between a crash in the main loop and a crash in a scheduled function.

This has re-occured on a later build in #304 so I'll close this issue.