Closed nomis closed 5 months ago
The last communication was about 3 minutes before the crash:
2024-04-02 17:28:53.050 DEBUG (MainThread) [zigpy.zcl] [0x78C0:11:0x0006] Attribute report received: on_off=<Bool.false: 0>
2024-04-02 17:28:53.157 DEBUG (MainThread) [zigpy.zcl] [0x78C0:61:0x000f] Attribute report received: present_value=<Bool.false: 0>
2024-04-02 17:28:56.381 DEBUG (MainThread) [zigpy.zcl] [0x78C0:1:0x000c] Attribute report received: present_value=6.453306198120117
2024-04-02 17:29:57.461 DEBUG (MainThread) [zigpy.zcl] [0x78C0:1:0x000c] Attribute report received: status_flags=<bitmap8.0: 0>
2024-04-02 17:31:42.396 DEBUG (MainThread) [zigpy.zcl] [0x78C0:34:0x0006] Sending request: Read_Attributes(attribute_ids=[0])
2024-04-02 17:31:42.484 DEBUG (MainThread) [zigpy.zcl] [0x78C0:34:0x0006] Decoded ZCL frame: OnOff:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0000, status=<Status.SUCCESS: 0>, value=TypeValue(type=Bool, value=<Bool.false: 0>))])
@nomis ,
Is it possible to enable the trace log mode for debugging this issue? The trace mode will generate more log for you.
You're going to need to elaborate on which trace mode and logging build options should be used.
When I tried this previously all I got was so much output to the UART that no lower priority task could run, which I could work around but it implies that there's going to be more output than the maximum speed of my UART receiver at 115200bps.
Could you not instrument the library to provide more detail on the current state when the error occurs?
@nomis, I have the same panic during call to esp_zb_scheduler_alarm, as a workaround I rolled back to prev version of library (in idf_component.yml):
espressif/esp-zboss-lib: "<=1.2.1"
@nomis ,
Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: core-dump-2024-04-02.txt Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)
Please also provide the .elf
file.
@nomis ,
Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: core-dump-2024-04-02.txt Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)
Please also provide the
.elf
file.
I've fixed the attachment in the description: candle-dribbler.elf.gz
@nomis ,
We've debugged some detailed issues based on the information you provided and made some fixes, but I cannot guarantee they are the root cause of this issue. The fixes will be released in the next version. Before that, could you replace this function the code below with the belowed code and test again?
static void ieee802154_receive_done(uint8_t *data, esp_ieee802154_frame_info_t *frame_info)
{
// If the RX done packet is written in the stub buffer, drop it silently.
if (s_rx_index == CONFIG_IEEE802154_RX_BUFFER_SIZE) {
esp_rom_printf("receive buffer full, drop the current frame.\n");
} else {
// Otherwise, post it to the upper layer.
frame_info->process = true;
if (data[0] > 127) {
esp_rom_printf("ovesized frame: %d\n", data[0]);
}
esp_ieee802154_receive_done(data, frame_info);
}
}
By the way, enabling the debug mode
is helpful for this issue. Could you also add the esp_zb_set_trace_level_mask(ESP_ZB_TRACE_LEVEL_CRITICAL, 0);
function before esp_zb_init(&zb_nwk_cfg);
to reproduce the issue and provide us with the result?
@nomis ,
Could you please test the issue again using the latest version of the esp-zigbee-sdk? We have made some updates that may be helpful in resolving it.
Ok, but you really need to stop making changes where it loses configuration on upgrade:
2024-04-12 12:17:24.561647 ZBOSS: common/zb_nvram.c:3189 ds_ver == 2
2024-04-12 12:17:24.561700 ZBOSS: common/zb_nvram.c:3072 ERROR: unexpected binding dataset length; size from nvram 576, expected size 576
2024-04-12 12:17:24.577870 ZBOSS: common/zb_nvram.c:3237 ERROR: could not read the APS binding table from NVRAM, page 1 pos 1328, ds_ver 2
Do you still want me to call esp_zb_set_trace_level_mask(ESP_ZB_TRACE_LEVEL_CRITICAL, 0);
or should that be omitted? It looks like there's a bit more debug output without it on the latest version.
Source: https://github.com/nomis/candle-dribbler/tree/0.7.3 Binary: candle-dribbler.elf.gz
This has been crashing all day:
...
2024-04-12 12:39:18.087437 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.087485 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109236 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109278 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.109295 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:18.112293 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:21.650268 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 12:39:21.650318 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 12:39:37.938296
2024-04-12 12:39:37.938358 assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:487 (s_ieee802154_state == IEEE802154_STATE_RX)
2024-04-12 12:39:37.938374 Core 0 register dump:
2024-04-12 12:39:37.938388 MEPC : 0x40800774 RA : 0x40808bb2 SP : 0x40813690 GP : 0x40811680
2024-04-12 12:39:37.960270 TP : 0x408335f0 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130
2024-04-12 12:39:37.960322 S0/FP : 0x00000001 S1 : 0x00000060 A0 : 0x408136e4 A1 : 0x40811d01
2024-04-12 12:39:37.960338 A2 : 0x00000001 A3 : 0x00000029 A4 : 0x00000001 A5 : 0x4081a000
2024-04-12 12:39:37.982319 A6 : 0x00000010 A7 : 0x76757473 S2 : 0x408136d8 S3 : 0x40813817
2024-04-12 12:39:37.982373 S4 : 0x40811d00 S5 : 0x408136d8 S6 : 0x420dd2b4 S7 : 0x00001000
2024-04-12 12:39:37.982403 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
2024-04-12 12:39:38.004324 T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938
2024-04-12 12:39:38.004375 MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000007 MTVAL : 0x00000000
2024-04-12 12:39:38.004392 MHARTID : 0x00000000
2024-04-12 12:39:38.004406
2024-04-12 12:39:38.004420 Stack memory:
2024-04-12 12:39:38.026270 40813690: 0x00000000 0x00000000 0x420e6758 0x4080fdf4 0x00000000 0x00000000 0x00000000 0x00373834
2024-04-12 12:39:38.026324 408136b0: 0x00000000 0x40811e0c 0x420e6758 0x40811e1c 0x420dcff7 0x40811e20 0x408136ac 0x40811e24
2024-04-12 12:39:38.026341 408136d0: 0x420dd2b4 0x40811d00 0x00000000 0x00000000 0x00000000 0x65737361 0x66207472 0x656c6961
2024-04-12 12:39:38.049332 408136f0: 0x69203a64 0x685f7273 0x6c646e61 0x78725f65 0x6f62615f 0x65207472 0x695f7073 0x38656565
2024-04-12 12:39:38.049388 40813710: 0x35313230 0x65645f34 0x3a632e76 0x20373834 0x695f7328 0x38656565 0x35313230 0x74735f34
2024-04-12 12:39:38.071357 40813730: 0x20657461 0x49203d3d 0x38454545 0x35313230 0x54535f34 0x5f455441 0x00295852 0x40803fb6
2024-04-12 12:39:38.071407 40813750: 0x00000000 0x00000000 0x40828c44 0x408041f8 0x00000020 0x00000000 0x00000000 0x00000001
2024-04-12 12:39:38.071422 40813770: 0x00000020 0x4081b000 0x00000500 0x00000020 0x00000001 0x00000000 0x4082acdc 0x420a0a18
2024-04-12 12:39:38.093351 40813790: 0x00000000 0x00000001 0x00000004 0x03000001 0x00000000 0x00000026 0x4081b000 0xfbd6ee9e
2024-04-12 12:39:38.093400 408137b0: 0x40818356 0x4081a000 0x00000000 0x00001000 0x0001e000 0x00000000 0x00010000 0x40817030
2024-04-12 12:39:38.115289 408137d0: 0x4081b000 0x600a3000 0x00000010 0x4080ec7e 0x0001e000 0x00000000 0x00010000 0x00000001
2024-04-12 12:39:38.115344 408137f0: 0x00001881 0x80000010 0x40811a14 0x408001e8 0x6000a000 0x408073d0 0x408073dc 0x00000000
2024-04-12 12:39:38.115361 40813810: 0x4081380c 0x00000000 0x00000000 0x00000000 0x40813824 0xffffffff 0x40813824 0x40813824
2024-04-12 12:39:38.138280 40813830: 0x00000000 0x40813838 0xffffffff 0x40813838 0x40813838 0x00000001 0x00000001 0x00000000
2024-04-12 12:39:38.138329 40813850: 0x0001ffff 0x00000000 0x00000000 0x00000004 0x00000000 0x00000000 0x00000000 0x40813868
2024-04-12 12:39:38.160457 40813870: 0x00000000 0x00000000 0x00000000 0x40813880 0xffffffff 0x40813880 0x40813880 0x00000000
2024-04-12 12:39:38.160507 40813890: 0x40813894 0xffffffff 0x40813894 0x40813894 0x00000001 0x00000001 0x00000000 0x0001ffff
2024-04-12 12:39:38.160522 408138b0: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x6000a000 0x408073d0 0x408073dc
2024-04-12 12:39:38.182375 408138d0: 0x40820854 0x00000000 0x00000000 0x4081b240 0x4081b304 0x4081b3c8 0x40826a78 0x00000000
2024-04-12 12:39:38.182454 408138f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.182471 40813910: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.204406 40813930: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.204456 40813950: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226337 40813970: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226389 40813990: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.226405 408139b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.249296 408139d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.249349 408139f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271438 40813a10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271489 40813a30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.271504 40813a50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.293353 40813a70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 12:39:38.293408
2024-04-12 12:39:38.293427
2024-04-12 12:39:38.293442
2024-04-12 12:39:38.293456 ELF file SHA256: 02fdc2544
It wrote a core dump:
2024-04-12 12:39:39.333419 D (8066) esp_core_dump_elf: ================ Processing coredump info ================
2024-04-12 12:39:39.333433 D (8074) esp_core_dump_elf: Application SHA256='02fdc2544', length=9.
2024-04-12 12:39:39.355372 D (8080) esp_core_dump_elf: Add note size=104, start_off=0x2d54
2024-04-12 12:39:39.355424 D (8086) esp_core_dump_elf: Add note size=36, start_off=0x2d54
2024-04-12 12:39:39.355440 D (8091) esp_core_dump_elf: Core dump note data needs 1 bytes padding
2024-04-12 12:39:39.355454 D (8097) esp_core_dump_elf: Add note size=136, start_off=0x2d54
2024-04-12 12:39:39.377540 D (8103) esp_core_dump_elf: Add NOTE segment, size=276, start_off=0x2d54
2024-04-12 12:39:39.377592 D (8109) esp_core_dump_elf: =========== Data written size = 11904 bytes ==========
2024-04-12 12:39:39.377607 D (8116) esp_core_dump_crc: Total length of hashed data: 11904
2024-04-12 12:39:39.400379 I (8122) esp_core_dump_flash: Write end offset 0x2e84, check sum length 4
2024-04-12 12:39:39.400431 I (8129) esp_core_dump_common: Core dump used 888 bytes on stack. 476 bytes left free.
2024-04-12 12:39:39.400448 I (8136) esp_core_dump_common: Restoring stack @ 0x40813540
2024-04-12 12:39:39.400463 I (8141) esp_core_dump_flash: Core dump has been saved to flash.
But then it's apparently invalid so I can't read it back:
2024-04-12 12:39:39.489545 I (65) esp_core_dump_flash: Init core dump to flash
2024-04-12 12:39:39.489597 I (66) esp_core_dump_flash: Found partition 'coredump' @ 40000 262144 bytes
2024-04-12 12:39:39.489614 D (77) esp_core_dump_crc: Total length of hashed data: 11904
2024-04-12 12:39:39.489627 E (77) esp_core_dump_flash: Core dump data check failed:
2024-04-12 12:39:39.493498 Calculated checksum='2c9a2b2f'
2024-04-12 12:39:39.493542 Image checksum='00f3106d'
It then crashes with an invalid read, repeatedly:
2024-04-12 13:02:40.964725 W (900553) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=12 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:41.018235 W (900603) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=22 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:41.034033 W (900623) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 13:02:41.745172 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.745221 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767280 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767334 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.767370 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:41.770177 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:43.538896 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:43.538948 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:44.271842 W (903853) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=32 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 13:02:44.428478 W (904023) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 13:02:44.898276 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 13:02:44.898328 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 13:02:47.767787 W (907353) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=62 cluster_id=15 cluster_role=1 attr_id=85 manuf_code=0
2024-04-12 13:02:47.829205 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:02:47.829260
2024-04-12 13:02:47.829277 Core 0 register dump:
2024-04-12 13:02:47.829291 MEPC : 0x4206b988 RA : 0x4206b988 SP : 0x408334a0 GP : 0x40811680
2024-04-12 13:02:47.852163 TP : 0x408335f0 T0 : 0x40022494 T1 : 0xffffffe0 T2 : 0x0000003f
2024-04-12 13:02:47.852209 S0/FP : 0x4082cc88 S1 : 0x0000003c A0 : 0x00000000 A1 : 0x00000055
2024-04-12 13:02:47.852225 A2 : 0x00000000 A3 : 0x00000002 A4 : 0x00000000 A5 : 0x00000001
2024-04-12 13:02:47.874300 A6 : 0x00000001 A7 : 0x00000000 S2 : 0x00000001 S3 : 0x4082cc50
2024-04-12 13:02:47.874350 S4 : 0x00000000 S5 : 0x00000017 S6 : 0x00000000 S7 : 0x00000001
2024-04-12 13:02:47.874366 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
2024-04-12 13:02:47.896392 T3 : 0x00000000 T4 : 0x00000000 T5 : 0x40826f5c T6 : 0x00000001
2024-04-12 13:02:47.896445 MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000005 MTVAL : 0x00000003
2024-04-12 13:02:47.896461 MHARTID : 0x00000000
2024-04-12 13:02:47.896476
2024-04-12 13:02:47.896489 Stack memory:
2024-04-12 13:02:47.918160 408334a0: 0x0000ffff 0x420b5000 0x4082cc50 0x4082217f 0x0000004f 0x4082cc50 0x4082cbb2 0x42061de8
2024-04-12 13:02:47.918229 408334c0: 0x0000ffff 0x420b5000 0x4082cc50 0x42013cde 0x00000055 0x00000000 0x00000042 0x00000000
2024-04-12 13:02:47.918253 408334e0: 0x00000000 0x00000001 0x00000000 0x0000000f 0x00000000 0x00000000 0x4082cc50 0x4206b1d2
2024-04-12 13:02:47.940312 40833500: 0xc67078c0 0x0000004a 0x013c6ae7 0x00000000 0x00000001 0x00000000 0x00000000 0x4203e866
2024-04-12 13:02:47.940362 40833520: 0x00000000 0x420a08d2 0x00000000 0x00000000 0x4206afb4 0xff00000d 0x40826894 0x40809b4a
2024-04-12 13:02:47.963296 40833540: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x40827be4 0x42070fb6
2024-04-12 13:02:47.963348 40833560: 0x00000000 0x00000000 0x00000000 0x4201e1e8 0x00000000 0x00000000 0x00000000 0x42010456
2024-04-12 13:02:47.963364 40833580: 0x00000000 0x00000000 0x00000000 0x420a44b0 0x00000000 0x00000000 0x00000000 0x40827064
2024-04-12 13:02:47.985194 408335a0: 0x00000000 0x00000000 0x40827be4 0x42004e22 0x00000000 0x00000000 0x00000000 0x40809da2
2024-04-12 13:02:47.985251 408335c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.007195 408335e0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x0000015c 0x408331d0
2024-04-12 13:02:48.007244 40833600: 0x00016269 0x40812e40 0x40812e40 0x408335fc 0x40812e38 0x00000014 0x4082730c 0x4082730c
2024-04-12 13:02:48.007261 40833620: 0x408335fc 0x00000000 0x00000005 0x408315f8 0x6267697a 0x6d5f6565 0x006e6961 0x00000000
2024-04-12 13:02:48.029283 40833640: 0x408335f0 0x00000005 0x00000000 0x00000005 0x00000001 0x00000000 0x00000000 0x00d813bd
2024-04-12 13:02:48.029333 40833660: 0x00000009 0x4081b5b8 0x4081b620 0x4081b688 0x00000000 0x00000000 0x00000001 0x00000000
2024-04-12 13:02:48.029348 40833680: 0x00000000 0x00000000 0x42004166 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.052173 408336a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.052225 408336c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074213 408336e0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074265 40833700: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.074285 40833720: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 13:02:48.096284 40833740: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00002000 0xa5a5a5a5
2024-04-12 13:02:48.096345 40833760: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118248 40833780: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118318 408337a0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.118335 408337c0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140273 408337e0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140325 40833800: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.140340 40833820: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.163280 40833840: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.163333 40833860: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.179461 40833880: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 13:02:48.179513
2024-04-12 13:02:48.179529
2024-04-12 13:02:48.179542
2024-04-12 13:02:48.179556 ELF file SHA256: 02fdc2544
...
2024-04-12 13:17:49.090457 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:32:50.357563 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 13:47:51.626877 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:02:52.894122 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:17:54.167366 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:32:55.443358 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 14:47:56.634586 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:02:57.910721 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:17:59.194935 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:33:00.484131 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 15:48:01.769565 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:03:03.084696 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:18:04.382838 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:33:05.680234 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 16:48:06.984327 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:03:08.292488 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:18:09.612808 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:33:10.926938 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 17:48:12.242206 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:03:16.579405 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:18:20.806673 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
This is an extremely annoying problem that you keep creating on new versions of your library, where it can't handle data being in an older format or a different size and so it initialises the zboss library with invalid data. You need to stop doing this.
I have a workaround in place to handle zb_zcl_get_attr_desc_manuf_a()
returning NULL which you can see in the logs but it's not enough to fix whatever this new bug is.
2024-04-12 18:33:02.436735 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.436788 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459748 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459800 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.459817 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:33:02.462389 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:33:21.737462 W (900562) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=12 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.786043 W (900602) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=22 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.841192 W (900662) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=32 cluster_id=6 cluster_role=1 attr_id=0 manuf_code=0
2024-04-12 18:33:21.897626 W (900722) nutt.ZigbeeDevice: zb_zcl_send_report_attr_command: fixing manuf_code for ep=62 cluster_id=15 cluster_role=1 attr_id=85 manuf_code=0
2024-04-12 18:33:21.958793 Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
2024-04-12 18:33:21.958846
2024-04-12 18:33:21.958863 Core 0 register dump:
2024-04-12 18:33:21.958877 MEPC : 0x4206b988 RA : 0x4206b988 SP : 0x408334a0 GP : 0x40811680
2024-04-12 18:33:21.981837 TP : 0x408335f0 T0 : 0x40022494 T1 : 0xffffffe0 T2 : 0x0000003f
2024-04-12 18:33:21.981895 S0/FP : 0x4082cc88 S1 : 0x0000003c A0 : 0x00000000 A1 : 0x00000055
2024-04-12 18:33:21.981911 A2 : 0x00000000 A3 : 0x00000002 A4 : 0x00000000 A5 : 0x00000001
2024-04-12 18:33:22.003749 A6 : 0x00000001 A7 : 0x00000000 S2 : 0x00000001 S3 : 0x4082cc50
2024-04-12 18:33:22.003801 S4 : 0x00000000 S5 : 0x00000017 S6 : 0x00000000 S7 : 0x00000001
2024-04-12 18:33:22.003817 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
2024-04-12 18:33:22.025862 T3 : 0x00000000 T4 : 0x00000000 T5 : 0x40826f5c T6 : 0x00000001
2024-04-12 18:33:22.025914 MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000005 MTVAL : 0x00000003
2024-04-12 18:33:22.025931 MHARTID : 0x00000000
2024-04-12 18:33:22.025945
2024-04-12 18:33:22.025959 Stack memory:
2024-04-12 18:33:22.047888 408334a0: 0x0000ffff 0x420b5000 0x4082cc50 0x4082217f 0x0000004f 0x4082cc50 0x4082cbb2 0x42061de8
2024-04-12 18:33:22.047941 408334c0: 0x0000ffff 0x420b5000 0x4082cc50 0x42013cde 0x00000055 0x00000000 0x00000042 0x00000000
2024-04-12 18:33:22.047957 408334e0: 0x00000000 0x00000001 0x00000000 0x0000000f 0x00000000 0x00000000 0x4082cc50 0x4206b1d2
2024-04-12 18:33:22.070885 40833500: 0xcb8978c0 0x0000004f 0x00293ec7 0x00000000 0x00000001 0x00000000 0x00000000 0x4203e866
2024-04-12 18:33:22.070938 40833520: 0x00000000 0x420a08d2 0x00000000 0x00000000 0x4206afb4 0xff000008 0x40826894 0x40809b4a
2024-04-12 18:33:22.092676 40833540: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x40827be4 0x42070fb6
2024-04-12 18:33:22.092735 40833560: 0x00000000 0x00000000 0x00000000 0x4201e1e8 0x00000000 0x00000000 0x00000000 0x42010456
2024-04-12 18:33:22.092752 40833580: 0x00000000 0x00000000 0x00000000 0x420a44b0 0x00000000 0x00000000 0x00000000 0x40827064
2024-04-12 18:33:22.114792 408335a0: 0x00000000 0x00000000 0x40827be4 0x42004e22 0x00000000 0x00000000 0x00000000 0x40809da2
2024-04-12 18:33:22.114840 408335c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.136857 408335e0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x0000015c 0x40833430
2024-04-12 18:33:22.136903 40833600: 0x00015fd0 0x40812e40 0x40812e40 0x408335fc 0x40812e38 0x00000014 0x4082730c 0x4082730c
2024-04-12 18:33:22.136918 40833620: 0x408335fc 0x00000000 0x00000005 0x408315f8 0x6267697a 0x6d5f6565 0x006e6961 0x00000000
2024-04-12 18:33:22.158778 40833640: 0x408335f0 0x00000005 0x00000000 0x00000005 0x00000001 0x00000000 0x00000000 0x00d3e58a
2024-04-12 18:33:22.158831 40833660: 0x00000009 0x4081b5b8 0x4081b620 0x4081b688 0x00000000 0x00000000 0x00000001 0x00000000
2024-04-12 18:33:22.158848 40833680: 0x00000000 0x00000000 0x42004166 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.181732 408336a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.181788 408336c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203884 408336e0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203936 40833700: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.203952 40833720: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:33:22.225861 40833740: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00002000 0xa5a5a5a5
2024-04-12 18:33:22.225908 40833760: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247812 40833780: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247864 408337a0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.247880 408337c0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270849 408337e0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270902 40833800: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.270918 40833820: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.292870 40833840: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.292922 40833860: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.314789 40833880: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
2024-04-12 18:33:22.314840
2024-04-12 18:33:22.314856
2024-04-12 18:33:22.314869
2024-04-12 18:33:22.314882 ELF file SHA256: 02fdc2544
Core dump: core-dump-2024-04-12.txt
================== CURRENT THREAD REGISTERS ===================
ra 0x4206b988 0x4206b988
sp 0x408334a0 0x408334a0
gp 0x40811680 0x40811680 <_impure_data+120>
tp 0x408335f0 0x408335f0
t0 0x40022494 1073882260
t1 0xffffffe0 -32
t2 0x3f 63
fp 0x4082cc88 0x4082cc88
s1 0x3c 60
a0 0x0 0
a1 0x55 85
a2 0x0 0
a3 0x2 2
a4 0x0 0
a5 0x1 1
a6 0x1 1
a7 0x0 0
s2 0x1 1
s3 0x4082cc50 1082313808
s4 0x0 0
s5 0x17 23
s6 0x0 0
s7 0x1 1
s8 0x0 0
s9 0x0 0
s10 0x0 0
s11 0x0 0
t3 0x0 0
t4 0x0 0
t5 0x40826f5c 1082290012
t6 0x1 1
pc 0x4206b988 0x4206b988
==================== CURRENT THREAD STACK =====================
#0 0x4206b988 in ?? ()
This is in zb_zcl_get_next_reporting_info()
which has just called zb_zcl_get_attr_desc_manuf_a()
with manuf_code
of 0
(a4) and received NULL back (a0) so it crashes when it dereferences it:
4206b970: 03645703 lhu a4,54(s0)
4206b974: 00645683 lhu a3,6(s0)
4206b978: 00444603 lbu a2,4(s0)
4206b97c: 00245583 lhu a1,2(s0)
4206b980: 00144503 lbu a0,1(s0)
4206b984: cd9f40ef jal 4206065c <zb_zcl_get_attr_desc_manuf_a>
4206b988: 00354783 lbu a5,3(a0)
The first error happened again trying to rejoin the network after erasing the flash:
2024-04-12 18:53:53.957550 D (541) nutt.ZigbeeDevice: Zigbee stack initialized
2024-04-12 18:53:53.957605 I (541) nutt.ZigbeeDevice: Device address: 40:4c:ca:ff:fe:43:0b:d0/0xffff (network not configured)
2024-04-12 18:53:53.957621 I (541) nutt.ZigbeeDevice: Waiting for pairing button press (ZDO Skip Start Up)
2024-04-12 18:53:53.979518 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:53.979570 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:53.979586 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.001622 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.001676 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.005711 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:53:54.005759 D (591) nutt: Free stack: 1208
2024-04-12 18:53:56.763092 D (3361) nutt.ZigbeeDevice: Connecting (join network)
2024-04-12 18:54:00.004792 I (6601) nutt.ZigbeeDevice: Joined network successfully (6f:06:fc:4a:5b:92:dd:9c/0x8813@26 as 0x78c0)
2024-04-12 18:54:00.702054 ZBOSS: nwk/nwk_nlme.c:794 OOM state is detected. ACHTUNG: Out of memory!
2024-04-12 18:54:00.702109 ZBOSS: common/zb_bufpool_mult.c:1607 OOM! bufs allocated {38 i-o 8} max 80
2024-04-12 18:54:00.702126 ZBOSS: nwk/nwk_nlme.c:836 ACHTUNG: Sending OOM status
2024-04-12 18:54:00.702141 ZBOSS: nwk/nwk_nlme.c:857 OOM state ends
2024-04-12 18:54:01.689753 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:01.689805 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:04.799229 ZBOSS: nwk/zb_nwk_ed_aging.c:227 ED aging is not supported
2024-04-12 18:54:09.430295 W (16031) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 18:54:44.769226 I (51371) nutt.ZigbeeDevice: No bindings
2024-04-12 18:54:50.233597
2024-04-12 18:54:50.233651 assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:487 (s_ieee802154_state == IEEE802154_STATE_RX)
2024-04-12 18:54:50.233667 Core 0 register dump:
2024-04-12 18:54:50.233680 MEPC : 0x40800774 RA : 0x40808bb2 SP : 0x40813690 GP : 0x40811680
2024-04-12 18:54:50.255602 TP : 0x40833570 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130
2024-04-12 18:54:50.255651 S0/FP : 0x00000001 S1 : 0x00000060 A0 : 0x408136e4 A1 : 0x40811d01
2024-04-12 18:54:50.255667 A2 : 0x00000001 A3 : 0x00000029 A4 : 0x00000001 A5 : 0x4081a000
2024-04-12 18:54:50.277583 A6 : 0x00000010 A7 : 0x76757473 S2 : 0x408136d8 S3 : 0x40813817
2024-04-12 18:54:50.277638 S4 : 0x40811d00 S5 : 0x408136d8 S6 : 0x420dd2b4 S7 : 0x00001000
2024-04-12 18:54:50.277655 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
2024-04-12 18:54:50.299544 T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938
2024-04-12 18:54:50.299598 MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000007 MTVAL : 0x00000000
2024-04-12 18:54:50.299616 MHARTID : 0x00000000
2024-04-12 18:54:50.299630
2024-04-12 18:54:50.299645 Stack memory:
2024-04-12 18:54:50.322471 40813690: 0x00000000 0x00000000 0x420e6758 0x4080fdf4 0x00000000 0x00000000 0x00000000 0x00373834
2024-04-12 18:54:50.322523 408136b0: 0x00000000 0x40811e0c 0x420e6758 0x40811e1c 0x420dcff7 0x40811e20 0x408136ac 0x40811e24
2024-04-12 18:54:50.322540 408136d0: 0x420dd2b4 0x40811d00 0x00000000 0x00000000 0x00000000 0x65737361 0x66207472 0x656c6961
2024-04-12 18:54:50.344548 408136f0: 0x69203a64 0x685f7273 0x6c646e61 0x78725f65 0x6f62615f 0x65207472 0x695f7073 0x38656565
2024-04-12 18:54:50.344601 40813710: 0x35313230 0x65645f34 0x3a632e76 0x20373834 0x695f7328 0x38656565 0x35313230 0x74735f34
2024-04-12 18:54:50.366488 40813730: 0x20657461 0x49203d3d 0x38454545 0x35313230 0x54535f34 0x5f455441 0x00295852 0x40803fb6
2024-04-12 18:54:50.366541 40813750: 0x40820850 0x00000000 0x40828760 0x408041f8 0x00000020 0x00000000 0x00000000 0x00000001
2024-04-12 18:54:50.366557 40813770: 0x00000020 0x4081b000 0x00000780 0x00000020 0x00000001 0x00000000 0x4082ec28 0x420a0a18
2024-04-12 18:54:50.388551 40813790: 0x00000000 0x00000001 0x00000004 0x03000001 0x00000000 0x0000004f 0x4081b000 0x2955d241
2024-04-12 18:54:50.388604 408137b0: 0x408197ff 0x4081a000 0x00000000 0x00001000 0x0001b000 0x00001000 0x00010000 0x40817030
2024-04-12 18:54:50.410450 408137d0: 0x4081b000 0x600a3000 0x00000010 0x4080ec7e 0x0001b000 0x00001000 0x00010000 0x00000001
2024-04-12 18:54:50.410502 408137f0: 0x00001881 0x80000010 0x40811a14 0x408001e8 0x6000a000 0x408073d0 0x408073dc 0x00000000
2024-04-12 18:54:50.410519 40813810: 0x4081380c 0x00000000 0x00000000 0x00000000 0x40813824 0xffffffff 0x40813824 0x40813824
2024-04-12 18:54:50.433499 40813830: 0x00000000 0x40813838 0xffffffff 0x40813838 0x40813838 0x00000001 0x00000001 0x00000000
2024-04-12 18:54:50.433551 40813850: 0x0001ffff 0x00000000 0x00000000 0x00000004 0x00000000 0x00000000 0x00000000 0x40813868
2024-04-12 18:54:50.455536 40813870: 0x00000000 0x00000000 0x00000000 0x40813880 0xffffffff 0x40813880 0x40813880 0x00000000
2024-04-12 18:54:50.455594 40813890: 0x40813894 0xffffffff 0x40813894 0x40813894 0x00000001 0x00000001 0x00000000 0x0001ffff
2024-04-12 18:54:50.455611 408138b0: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x6000a000 0x408073d0 0x408073dc
2024-04-12 18:54:50.477519 408138d0: 0x408207d0 0x00000000 0x00000000 0x4081b240 0x4081b304 0x4081b3c8 0x408269f4 0x00000000
2024-04-12 18:54:50.477571 408138f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.477588 40813910: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.499520 40813930: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.499574 40813950: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522521 40813970: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522574 40813990: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.522591 408139b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.544627 408139d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.544678 408139f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566602 40813a10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566652 40813a30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.566674 40813a50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.588599 40813a70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-12 18:54:50.588646
2024-04-12 18:54:50.588662
2024-04-12 18:54:50.588675
2024-04-12 18:54:50.588688 ELF file SHA256: 02fdc2544
Again, I can't access the core dump:
2024-04-12 18:57:19.379179 D (4076) esp_core_dump_crc: Total length of hashed data: 11968
2024-04-12 18:57:19.379253 E (4077) esp_core_dump_flash: Core dump data check failed:
2024-04-12 18:57:19.379269 Calculated checksum='84c71dfa'
2024-04-12 18:57:19.379283 Image checksum='d67d921d'
2024-04-12 18:57:19.379296 I (147672) nutt.Device: No core dump: 265
It's then running out of receive buffer space while binding all the endpoint clusters:
2024-04-12 18:54:52.320464 D (612) nutt.ZigbeeDevice: Zigbee stack initialized
2024-04-12 18:54:52.320517 I (612) nutt.ZigbeeDevice: Device address: 40:4c:ca:ff:fe:43:0b:d0/0x78c0 (network configured)
2024-04-12 18:54:52.320534 D (612) nutt.ZigbeeDevice: Connecting (ZDO Skip Start Up)
2024-04-12 18:54:52.342465 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.342521 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.342538 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364644 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364696 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.364712 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:54:52.366368 D (662) nutt: Free stack: 1208
2024-04-12 18:54:54.650843 ZBOSS: nwk/nwk_join.c:186 low lqi 0
2024-04-12 18:54:54.683200 D (2992) nutt.ZigbeeDevice: Connecting (BDB Device Reboot)
2024-04-12 18:54:54.692954 I (2992) nutt.ZigbeeDevice: Joined network successfully (6f:06:fc:4a:5b:92:dd:9c/0x8813@26 as 0x78c0)
2024-04-12 18:54:55.241313 E (3542) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.313738 E (3592) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.332153 E (3632) ZB_ESP_MAC: zb mac received buffer full!
2024-04-12 18:54:55.964687 W (4272) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-12 18:54:56.413691 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-12 18:54:56.413746 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-12 18:55:00.402921 ZBOSS: nwk/zb_nwk_ed_aging.c:227 ED aging is not supported
@nomis From your previous shared info, the assert location indicates the device receives a 802.15.4 packet longer than 127 bytes, but it's not suppose to happen since our hardware filter should drop such packet. We added a double filter in the software in the 1.2.3 release.
If the issue still happens, we assume it's caused by some overflow issue, the buffer was overwritten due to some unexpected operation. Could you please share us the steps to reproduce the issue, we will debug it locally.
Could you please share us the steps to reproduce the issue, we will debug it locally.
I don't have a way to reproduce it. I just run it on my Zigbee network with ZHA and wait for it to crash.
The last communication was about 3 minutes before the crash
That's not strictly true... I was looking at a log filtered for communication with the device (0x78C0, via router 0x97F6). I hadn't looked at the log for communication with other devices. There was a request/response with another device (0xEB07, direct) at the time but it's not an abnormal message:
2024-04-02 17:34:30.858 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=False), tsn=119, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
2024-04-02 17:34:30.858 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Sending request: Read_Attributes(attribute_ids=[4])
2024-04-02 17:34:30.859 DEBUG (MainThread) [bellows.zigbee.application] Sending packet ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xEB07), dst_ep=1, source_route=[], extended_timeout=False, tsn=119, profile_id=260, cluster_id=0, data=Serialized[b'\x00w\x00\x04\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=None, rssi=None)
2024-04-02 17:34:30.883 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 60167, EmberApsFrame(profileId=260, clusterId=0, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=7), 120, <EmberStatus.SUCCESS: 0>, b'']
2024-04-02 17:34:30.887654 Zigbee stack assertion failed common/zb_bufpool_mult.c:1183
2024-04-02 17:34:30.890 DEBUG (MainThread) [bellows.zigbee.application] Received incomingRouteRecordHandler frame with [0xeb07, 00:12:4b:00:24:c5:95:a7, 104, -74, []]
2024-04-02 17:34:30.890 DEBUG (MainThread) [bellows.zigbee.application] Processing route record request: (0xeb07, 00:12:4b:00:24:c5:95:a7, 104, -74, [])
2024-04-02 17:34:30.901 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=0, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=127), 104, -74, 0xeb07, 255, 255, b'\x18w\x01\x04\x00\x00B\x06SONOFF']
2024-04-02 17:34:30.901 DEBUG (MainThread) [zigpy.application] Received a packet: ZigbeePacket(src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0xEB07), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), dst_ep=1, source_route=None, extended_timeout=False, tsn=127, profile_id=260, cluster_id=0, data=Serialized[b'\x18w\x01\x04\x00\x00B\x06SONOFF'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=104, rssi=-74)
2024-04-02 17:34:30.901 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Received ZCL frame: b'\x18w\x01\x04\x00\x00B\x06SONOFF'
2024-04-02 17:34:30.902 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=119, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
2024-04-02 17:34:30.902 DEBUG (MainThread) [zigpy.zcl] [0xEB07:1:0x0000] Decoded ZCL frame: Basic:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0004, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='SONOFF'))])
@nomis ,
Ok, but you really need to stop making changes where it loses configuration on upgrade:
Regarding the issue, we have marked it in the release notes.
Source: https://github.com/nomis/candle-dribbler/tree/0.7.3 Binary: candle-dribbler.elf.gz
This has been crashing all day:
The crashing may be triggered by too many stack logs. Could you use the esp-zigbee-sdk v1.2.3, which includes optimizations for this issue, to test the problem in release mode?
Ok, but you really need to stop making changes where it loses configuration on upgrade: Regarding the issue, we have marked it in the release notes.
No, you have not done this. You've noted that some sizes are configurable, but not that there are critical bugs in the library when the sizes change that require the zb_storage
partition to be erased to fix.
The crashing may be triggered by too many stack logs. Could you use the esp-zigbee-sdk v1.2.3, which includes optimizations for this issue, to test the problem in release mode?
The crashing is caused by the aforementioned bug where the zboss library calls zb_zcl_get_attr_desc_manuf_a()
with rep_info->manuf_code
of 0
instead of ZB_ZCL_MANUFACTURER_WILDCARD_ID
because the configuration is not initialised correctly. I still have an open bug #86 for this when endpoints are changed, but it happens every time you change some aspect of the configuration format/size too.
We've debugged some detailed issues based on the information you provided and made some fixes, but I cannot guarantee they are the root cause of this issue. The fixes will be released in the next version. Before that, could you replace this function the code below with the belowed code and test again?
This has happened once with 1.2.3 and it didn't result in a crash:
2024-04-20 01:42:50.956834 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:42:50.957000 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:42:51.885534 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:42:51.885690 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:42:52.096456 W (10806022) nutt.ZigbeeDevice: NLME status indication: 09 0x97f6 00
2024-04-20 01:45:00.529011 ovesized frame: 158
2024-04-20 01:47:51.018197 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:47:51.018351 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:51:51.061893 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-04-20 01:51:51.062046 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-04-20 01:55:51.101273 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
Of course, that doesn't prove that this was the cause if you've changed 1.2.3 to silently filter those messages out... I'll need to re-test with 1.2.1 until it happens again.
Hi, @nomis After discussing with the HW design team, we confirm that our ESP board can not receive any packets which the length is larger than 127. So we doubt that there might be some out-of-bound memory issues. The Rx frame buffer memory might be written by accident.
So, could you please help us to debug this issue? Let's set a watchpoint to monitor the Rx frame buffer.
First set the 154 rx frame buffer to 4 And then add these codes:
esp_cpu_set_watchpoint(0, s_rx_frame[0], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(1, s_rx_frame[1], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(2, s_rx_frame[2], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(3, s_rx_frame[3], 4, ESP_CPU_WATCHPOINT_STORE);
to the end of the function ieee802154_mac_init
If the watchpoint triggered, you will get a crash log, please share the crash logs to us.
Ok. I will also have the changes for #326 applied.
I've previously been using IEEE802154_RX_BUFFER_SIZE=100
(because I have a lot of endpoints/clusters) and IEEE802154_TIMING_OPTIMIZATION=y
.
Ok. I will also have the changes for #326 applied.
I've previously been using
IEEE802154_RX_BUFFER_SIZE=100
(because I have a lot of endpoints/clusters) andIEEE802154_TIMING_OPTIMIZATION=y
.
Ok,Refer to 154 Spec, its transmission speed is 250Kbps(32us for 1byte), maybe 1ms for receiving a 30-length packet. This time is enough for the SW to process this packet. Based our test in the high-traffice scenario, 20 is sufficient, so no need for using the size 100
for this buffer.
The reason why I told you to set the buffer size 4 is that C6 only support 4 watchpoints at the same time.
There's no documentation for IEEE802154 in the TRM, so I can't check how dma_rx_addr
is supposed to work but can receive happen into the buffer at any time?
What would happen if the dma_rx_addr
is set while a packet is being received - could a packet be split across two buffers or the middle of a packet be written to the start of a buffer? Either of those could result in a packet with the wrong length.
There's an inconsistency in the order of IEEE802154_SET_TXRX_PTI()
and set_next_rx_buffer()
here:
esp_err_t ieee802154_receive_at(uint32_t time)
{
...
IEEE802154_SET_TXRX_PTI(IEEE802154_SCENE_RX_AT);
set_next_rx_buffer();
Everything else sets the next buffer first.
(I'm not doing Zigbee and WiFi co-existence but it looks like the handling for it can't be disabled.)
The reason why I told you to set the buffer size 4 is that C6 only support 4 watchpoints at the same time.
SOC_CPU_WATCHPOINT_MAX_REGION_SIZE
is 0x80000000 on the ESP32-C6, so I could set a watchpoint on the entire s_rx_frame
array, regardless of how many receive buffers are configured?
The ieee802154_receive_at()
function also needs to call ieee802154_enter_critical()
earlier. Otherwise it's calling stop_current_operation()
(via rx_init()
) and set_next_rx_buffer()
with interrupts enabled.
This function isn't used by the Zigbee libraries.
The
ieee802154_receive_at()
function also needs to callieee802154_enter_critical()
earlier. Otherwise it's callingstop_current_operation()
(viarx_init()
) andset_next_rx_buffer()
with interrupts enabled.This function isn't used by the Zigbee libraries.
Thanks for catching this risk, this API is not used in Zigbee stack so this risk might not be related to this crash issue. For fixing this risk, we will fix it right now.
The
ieee802154_receive_at()
function also needs to callieee802154_enter_critical()
earlier. Otherwise it's callingstop_current_operation()
(viarx_init()
) andset_next_rx_buffer()
with interrupts enabled.This function isn't used by the Zigbee libraries.
Also welcome PRs to fix it :)
This is in
zb_zcl_get_next_reporting_info()
which has just calledzb_zcl_get_attr_desc_manuf_a()
withmanuf_code
of0
(a4) and received NULL back (a0) so it crashes when it dereferences it:
Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.
An example of how the bug appears:
Can you fix this bug ?
Thanks
....
I (6446) ZIGBEE: Device started up in non factory-reset mode
I (6447) ZIGBEE: Device rebooted
Guru Meditation Error: Core 0 panic'ed (Load access fault). Exception was unhandled.
Core 0 register dump:
MEPC : 0x4205a8f8 RA : 0x4205a8f4 SP : 0x40841bf0 GP : 0x4081e504
0x4205a8f8: zb_zcl_get_next_reporting_info at ??:?
0x4205a8f4: zb_zcl_get_next_reporting_info at ??:?
TP : 0x407f886c T0 : 0x40030dca T1 : 0x0000000f T2 : 0x62808861
0x40030dca: memset in ROM
S0/FP : 0x40848f94 S1 : 0x00000004 A0 : 0x00000000 A1 : 0x00000020
A2 : 0x0000ffff A3 : 0x0000000c A4 : 0x4084888d A5 : 0x00000005
A6 : 0x0000000c A7 : 0x0000000c S2 : 0x00000000 S3 : 0x40848eb4
S4 : 0x00000000 S5 : 0x00000006 S6 : 0x00000000 S7 : 0x00000052
S8 : 0x00000038 S9 : 0x0000ff42 S10 : 0x00000000 S11 : 0x00000007
T3 : 0x000000fd T4 : 0x000000fe T5 : 0xffffffff T6 : 0x000000ff
MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000005 MTVAL : 0x00000003
0x40800001: _vector_table at ??:?
MHARTID : 0x00000000
Stack memory:
40841bf0: 0x4205a224 0x00000017 0x0000004a 0x40848eec 0x40848eb4 0x40848a80 0x40842ec9 0x42054c5e
0x4205a224: zb_zcl_reporting_timer_handler at ??:?
0x42054c5e: zb_zcl_send_report_attr_command at ??:?
40841c10: 0x0000000f 0x40841c30 0x00000003 0x4206a394 0x0000000f 0x00000000 0x7fffffff 0x00000002
0x4206a394: zb_osif_scheduler_event at ??:?
40841c30: 0x0000000f 0x00000000 0x00000042 0x00000000 0x00000000 0x4205a3e8 0x00000006 0x00000000
0x4205a3e8: zb_zcl_report_attr at ??:?
40841c50: 0x00000002 0x00000052 0x40848eb4 0x4205a4d8 0x0000000f 0x4205a3e8 0x7fffffff 0x000000ff
0x4205a4d8: zb_zcl_report_attr at ??:?
0x4205a3e8: zb_zcl_report_attr at ??:?
40841c70: 0x0000000f 0x00000000 0x40828ba0 0x42044f14 0x0000007f 0x0000000f 0x40841dd8 0x00000000
0x42044f14: zb_sched_loop_iteration at ??:?
40841c90: 0x40848808 0x00000000 0x40846f48 0x40831000 0x00000000 0x0000ff42 0x00000b04 0x00000004
40841cb0: 0x0000007f 0x0000000f 0x40841dd8 0x40847aa0 0x40848808 0x408480fc 0x42158000 0x420364fe
0x420364fe: esp_zb_main_loop_iteration at ??:?
40841cd0: 0x42157ce8 0x00000394 0x00000007 0x42010b8a 0x00000025 0x40841dd8 0xa5a5a5a5 0xa5a5a5a5
0x42010b8a: zigbee_task at D:/08-LocalRepos/TICMeter/firmware/main/zigbee.c:635
40841cf0: 0xa5a5a5a5 0xa5a5a5a5 0x40847888 0x42150030 0x40847c08 0x40847d70 0x40847d04 0x40847a34
40841d10: 0x0000007f 0x0000ff42 0x0000002d 0x00000025 0xa5a5a5a5 0xa5a5a5a5 0xfb010101 0xa5a50008
40841d30: 0x00010000 0x000003e8 0x00000004 0x030205a0 0xa5a50040 0x00000000 0x00000000 0xa5a500fb
40841d50: 0x00000002 0x00000000 0x00000006 0x00000fa0 0x00030202 0x00c8ffff 0x00000000 0x00000000
40841d70: 0x00030202 0x00000000 0x00000000 0x00000000 0x2e335617 0x2d322e32 0x34672d31 0x30623832
40841d90: 0x642d3762 0x79747269 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5 0x32303213 0x34302d34 0x2038322d
40841db0: 0x333a3131 0x35303a37 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841dd0: 0xa5a5a5a5 0xa5a5a5a5 0x30000030 0x00003030 0x32333935 0xa5a5a500 0xa5a5a5a5 0xa5a5a5a5
40841df0: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841e10: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841e30: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x53010401 0x00303200 0x33393500 0xa5000032 0xa5a5a5a5
40841e50: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x4201041c 0x00000000 0x00000000 0x4081e504
0x4201041c: zigbee_task at D:/08-LocalRepos/TICMeter/firmware/main/zigbee.c:319
40841e70: 0x407f886c 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841e90: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841eb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841ed0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841ef0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
40841f10: 0x00000150 0x40841af0 0x0000336d 0x408221cc 0x408221cc 0x40841f14 0x408221c4 0x00000014
40841f30: 0x40848834 0x40848834 0x40841f14 0x00000000 0x00000005 0x4083ff10 0x6267695a 0x6d5f6565
40841f50: 0x006e6961 0x00000000 0x40841f00 0x00000005 0x00000001 0x00000000 0x00000000 0x00000009
40841f70: 0x408327cc 0x40832834 0x4083289c 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000
40841f90: 0x00000000 0x42006116 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x42006116: esp_cleanup_r at C:/Espressif/frameworks/esp-idf-v5.2.1/components/newlib/newlib_init.c:60
40841fb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
40841fd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
ELF file SHA256: ff35324e1e01a5c8
Rebooting in 2 seconds...```
@xmow49 ,
Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.
Regarding another issue, I suspect it's related to the binding table size. By default, the binding table size is set to 16 in the esp-zigbee-sdk. Could you try adjusting the size by calling esp_zb_aps_src_binding_table_size_set()
with your desired value? Keep in mind that this function needs to be called before esp_zb_init()
for it to take effect.
@xmow49 ,
Hello, I have a similar problem. When between boot, I change the number of registered clusters/attributes, and the zb lib crash. A simple flash erase fixes the problem.
Regarding another issue, I suspect it's related to the binding table size. By default, the binding table size is set to 16 in the esp-zigbee-sdk. Could you try adjusting the size by calling
esp_zb_aps_src_binding_table_size_set()
with your desired value? Keep in mind that this function needs to be called beforeesp_zb_init()
for it to take effect.
Many thanks, Im not able to reproduce the bug. I set the value to 64 and all works great ! Thanks
After discussing with the HW design team, we confirm that our ESP board can not receive any packets which the length is larger than 127. So we doubt that there might be some out-of-bound memory issues. The Rx frame buffer memory might be written by accident.
So, could you please help us to debug this issue? Let's set a watchpoint to monitor the Rx frame buffer.
This has finally logged a frame length over 127 bytes without a crash:
2024-05-02 12:40:55.056673 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-05-02 12:40:55.056724 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
2024-05-02 12:41:10.723378 E (777449843) ieee802154 frame: ieee802154_frame_get_src_addr(339): invalid frame type
2024-05-02 12:41:10.723430 oversized frame: 0x40817921 150
2024-05-02 12:41:55.067166 ZBOSS: zcl/zcl_main.c:357 zb_zcl_get_cluster_check_value: cluster_id 12 cluster_role 1
2024-05-02 12:41:55.067216 ZBOSS: zcl/zcl_main.c:382 zb_zcl_get_cluster_write_attr_hook: cluster_id 12 cluster_role 1
I'll modify it to log the contents of s_rx_frame
when this happens because I suspect it involves writing parts of a normal size frame across two buffers.
There are also some places in the code where the frame data is read without checking that the frame is it at least a certain minimum length too.
Hi @nomis , sorry for the delay, how many IEEE802154_RX_BUFFER_SIZE
did you used? I see your code, and for double confirm that did you use the size 4 for IEEE802154_RX_BUFFER_SIZE
?
For your final code:
esp_cpu_set_watchpoint(0, s_rx_frame[0], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(1, s_rx_frame[1], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(2, s_rx_frame[2], 4, ESP_CPU_WATCHPOINT_STORE);
esp_cpu_set_watchpoint(3, s_rx_frame[3], 4, ESP_CPU_WATCHPOINT_STORE);
The watchpoint can only monitor the length of the first 4 of s_rx_frame buffer...
Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.
Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.
Ok, so there will be no OOB issues for the Rx frame buffer in your programe based your test. I discuss this issue with the HW design team, and they said it will be a corner case. The HW will use the frame[0] & 0x7f as the frame length, and calculate the CRC from frame[1] to frame[frame[0] & 0x7f] to make sure the payload is received correctly. So just no check for the bit8 of the frame[0]. In some cases this bit will be wrong due to PHY decoding wrong under some noise env(very very low probability). We don't think it is a malignant bug, we will add some check in the driver to make sure the length of the packet is less than 127.
Thanks for your patience and hardwork on debugging this issue.
Yes, it's set to 4. I'm now waiting for it to happen again with extra logging.
Ok, so there will be no OOB issues for the Rx frame buffer in your programe based your test. I discuss this issue with the HW design team, and they said it will be a corner case. The HW will use the frame[0] & 0x7f as the frame length, and calculate the CRC from frame[1] to frame[frame[0] & 0x7f] to make sure the payload is received correctly. So just no check for the bit8 of the frame[0]. In some cases this bit will be wrong due to PHY decoding wrong under some noise env(very very low probability). We don't think it is a malignant bug, we will add some check in the driver to make sure the length of the packet is less than 127.
Thanks for your patience and hardwork on debugging this issue.
It's indeed a corner case, the fix has been merged to IDF: https://github.com/espressif/esp-idf/commit/17eb2dcce33e65efb4b1fa98e3b9ed90aa9debff, and will be included in the future IDF releases. Closing this issue.
It has taken 76 days uptime but I've finally had another instance of an oversized frame.
The first rx buffer changed from: 444188BC1388FFFF92790912FCFF92790135E75EB826004B1200286AAB3700E75EB826004B12000034E25CD9F29517536E2D94C5A3A80FD3BDE9D95640DA577A092036B00B5D5A6BD0A38E94B809F86C51EF59E22E02795C2228B1827D404E673C167B6A885021DF33540DC5D56FE59F5AC92F79177D7BE2000000000000000000
to: EE47B266D8976D327B7D66D006B0C1E70E1401331237026004C214E2300000C0301E5227311361102E320700F00670101F4066A31E200041008C3177234E3027D61E010003600C0C6D000603270100EFB000154866016000026202C4394004F402011D6AC40007F2620752F3C993FC9F5AC92F79177D7BE2000000000000000000
The number of bytes overwritten is consistent with a 7-bit length of 0xAE (110 bytes) but the content looks like noise or a message from a different PAN.
Answers checklist.
IDF version.
v5.3-dev-2320-ge4f167df25
esp-zigbee-lib version.
1.2.1
esp-zboss-lib version.
1.2.1
Espressif SoC revision.
ESP32-C6
What is the expected behavior?
Does not crash
What is the actual behavior?
Zigbee stack assertion failed common/zb_bufpool_mult.c:1183
Steps to reproduce.
Unknown
More Information.
Source: https://github.com/nomis/candle-dribbler/tree/0.7.2 Binary: candle-dribbler.elf.gz Core dump: core-dump-2024-04-02.txt Log: log-2024-04-02.txt (all except the last two switch changes have been omitted)