Open nomis opened 7 months ago
I've now had this on 1.2.1 too:
2024-04-20 18:10:00.166865 assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:487 (s_ieee802154_state == IEEE802154_STATE_RX)
2024-04-20 18:10:00.166882 Core 0 register dump:
2024-04-20 18:10:00.166897 MEPC : 0x40800774 RA : 0x40808c2c SP : 0x40813700 GP : 0x40811700
2024-04-20 18:10:00.189849 TP : 0x4082d650 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130
2024-04-20 18:10:00.189898 S0/FP : 0x00000001 S1 : 0x00000060 A0 : 0x40813754 A1 : 0x40811d79
2024-04-20 18:10:00.189915 A2 : 0x00000001 A3 : 0x00000029 A4 : 0x00000001 A5 : 0x4081f000
2024-04-20 18:10:00.211775 A6 : 0x00000010 A7 : 0x76757473 S2 : 0x40813748 S3 : 0x40813887
2024-04-20 18:10:00.211826 S4 : 0x40811d78 S5 : 0x40813748 S6 : 0x420ede28 S7 : 0x00001000
2024-04-20 18:10:00.211843 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
2024-04-20 18:10:00.233910 T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938
2024-04-20 18:10:00.233959 MSTATUS : 0x00001881 MTVEC : 0x40800001 MCAUSE : 0x00000007 MTVAL : 0x00000000
2024-04-20 18:10:00.233975 MHARTID : 0x00000000
2024-04-20 18:10:00.233990
2024-04-20 18:10:00.234003 Stack memory:
2024-04-20 18:10:00.255829 40813700: 0x00000000 0x00000000 0x420f749c 0x4080fe6e 0x00000000 0x00000000 0x00000000 0x00373834
2024-04-20 18:10:00.255874 40813720: 0x00000000 0x40811e84 0x420f749c 0x40811e94 0x420edb6b 0x40811e98 0x4081371c 0x40811e9c
2024-04-20 18:10:00.255890 40813740: 0x420ede28 0x40811d78 0x00000000 0x00000000 0x00000000 0x65737361 0x66207472 0x656c6961
2024-04-20 18:10:00.278749 40813760: 0x69203a64 0x685f7273 0x6c646e61 0x78725f65 0x6f62615f 0x65207472 0x695f7073 0x38656565
2024-04-20 18:10:00.278799 40813780: 0x35313230 0x65645f34 0x3a632e76 0x20373834 0x695f7328 0x38656565 0x35313230 0x74735f34
2024-04-20 18:10:00.300879 408137a0: 0x20657461 0x49203d3d 0x38454545 0x35313230 0x54535f34 0x5f455441 0x00295852 0x40801e22
2024-04-20 18:10:00.300925 408137c0: 0x00000000 0x056198c0 0x4082d9c8 0x40804030 0x00000000 0x00000000 0x4082d9c8 0x4081f000
2024-04-20 18:10:00.300942 408137e0: 0x4081b838 0x4081d030 0x600a3000 0x42091a0a 0x00000020 0x00000b40 0x00000004 0x00000000
2024-04-20 18:10:00.322779 40813800: 0x00000000 0x00000001 0x40813824 0x0381b8e8 0x4081d5bb 0x00000034 0x4081f000 0xdf784248
2024-04-20 18:10:00.322832 40813820: 0x4081d63c 0x4081f000 0x00000000 0x00001000 0x0001c000 0x00000000 0x00010000 0x4081bc08
2024-04-20 18:10:00.344816 40813840: 0x4081f000 0x600a3000 0x00000010 0x4080ecf8 0x0001c000 0x00000000 0x00010000 0x00000001
2024-04-20 18:10:00.344866 40813860: 0x00001881 0x80000010 0x40811a8c 0x408001e8 0x6000a000 0x4080744a 0x40807456 0x00000000
2024-04-20 18:10:00.344883 40813880: 0x4081387c 0x00000000 0x00000000 0x00000000 0x40813894 0xffffffff 0x40813894 0x40813894
2024-04-20 18:10:00.366885 408138a0: 0x00000000 0x408138a8 0xffffffff 0x408138a8 0x408138a8 0x00000001 0x00000001 0x00000000
2024-04-20 18:10:00.366933 408138c0: 0x0001ffff 0x00000000 0x00000000 0x00000004 0x00000000 0x00000000 0x00000000 0x408138d8
2024-04-20 18:10:00.389881 408138e0: 0x00000000 0x00000000 0x00000000 0x408138f0 0xffffffff 0x408138f0 0x408138f0 0x00000000
2024-04-20 18:10:00.389938 40813900: 0x40813904 0xffffffff 0x40813904 0x40813904 0x00000001 0x00000001 0x00000000 0x0001ffff
2024-04-20 18:10:00.389956 40813920: 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x6000a000 0x4080744a 0x40807456
2024-04-20 18:10:00.411936 40813940: 0x4082520c 0x00000000 0x00000000 0x4081fe00 0x4081fec4 0x4081ff88 0x40825ef0 0x00000000
2024-04-20 18:10:00.411996 40813960: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.412016 40813980: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.433870 408139a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.433925 408139c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.455887 408139e0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.455941 40813a00: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.455958 40813a20: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.478952 40813a40: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.479007 40813a60: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.500724 40813a80: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.500777 40813aa0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.500794 40813ac0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.522724 40813ae0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
2024-04-20 18:10:00.522778
2024-04-20 18:10:00.522795
2024-04-20 18:10:00.522810
2024-04-20 18:10:00.522824 ELF file SHA256: 7df18c605
Source: https://github.com/nomis/candle-dribbler/tree/c574069f97d7421069356264f3374282ed13cad5 (0.7.2-3-gc574069) Binary: candle-dribbler.elf.gz
No core dump because it's still creating it with an invalid CRC.
Core dump with an invalid CRC is written (this happens every time for this core dump).
Every time a core dump happens in ieee802154_isr()
, ~the cached_data
being written as part of the code dump process is unstable~. For testing I've added an assert()
to this function that I can trigger by changing a global value.
I've added extra CRC calculations to the core dump flash process, so that there are 2 before the flash write and 2 after it. All 4 CRC values are different. If I crash in one of my own tasks then all 4 CRC values are the same.
~When I print the data written to flash and the data in the CRC process, they're both different and then the data read from flash is different too.~
The problem is that it's writing memory that is currently in use (possibly by the core dump process), so it changes on every read within esp_core_dump_flash_write_data()
so the content on flash and the content used for the CRC is never the same.
Hi @nomis , we see this crash, in our 15.4 driver code, we only enabled the RX abort
interrupt related to IEEE802154_RX_ABORT_BY_TX_ACK_TIMEOUT
and IEEE802154_RX_ABORT_BY_TX_ACK_COEX_BREAK
. From the backtrace you shared, the crash occured at the line: 487, but these abort reasons were not enabled. So did you enable any other RX abort reasons manually in your application?
Hi @nomis , we see this crash, in our 15.4 driver code, we only enabled the
RX abort
interrupt related toIEEE802154_RX_ABORT_BY_TX_ACK_TIMEOUT
andIEEE802154_RX_ABORT_BY_TX_ACK_COEX_BREAK
. From the backtrace you shared, the crash occured at the line: 487, but these abort reasons were not enabled.
You're enabling those here: https://github.com/espressif/esp-idf/blob/e4f167df2504544d6f46655228634549c3d0d9c2/components/ieee802154/driver/esp_ieee802154_dev.c#L355 https://github.com/espressif/esp-idf/blob/e4f167df2504544d6f46655228634549c3d0d9c2/components/ieee802154/driver/esp_ieee802154_dev.c#L734
But the assert is here so neither of those abort reasons apply to this issue: https://github.com/espressif/esp-idf/blob/e4f167df2504544d6f46655228634549c3d0d9c2/components/ieee802154/driver/esp_ieee802154_dev.c#L484 https://github.com/nomis/esp-idf/blob/a70f4bef18e1f8b89d2dada641ad0e560d6091ad/components/ieee802154/driver/esp_ieee802154_dev.c#L487
I've updated the description because this is actually line 484 in v5.3-dev-2320-ge4f167df25
.
Hi, there are some configurations which may help us to analyse this issue, could you please enable these configurations then try to reproduce the issue, and share the enriched assert info to us? That might be more helpful.
@zwx1995esp there's still a serious bug in the code dump writing process; is anyone going to look at and merge https://github.com/espressif/esp-idf/pull/13651 ?
@nomis we will follow up https://github.com/espressif/esp-idf/pull/13651 internally.
Have you captured the issue again with the 802.15.4 debug mode enabled?
Not yet, the reduced number of RX buffers for testing #304 may be impacting it. I'll try again with the original configuration.
Answers checklist.
IDF version.
v5.3-dev-2320-ge4f167df25 (esp_ieee802154_dev.c:484) v5.3-dev-2321-ga70f4bef18 (esp_ieee802154_dev.c:487)
esp-zigbee-lib version.
1.2.3
esp-zboss-lib version.
1.2.3
Espressif SoC revision.
ESP32-C6
What is the expected behavior?
Does not crash
Does not write core dumps with an invalid CRC
What is the actual behavior?
assert failed: isr_handle_rx_abort esp_ieee802154_dev.c:484 (s_ieee802154_state == IEEE802154_STATE_RX)
Core dump with an invalid CRC is written (this happens every time for this core dump).
Steps to reproduce.
Unknown. This has happened 4 times in a week after updating only the Zigbee libraries from 1.2.1 to 1.2.3.
More Information.
Source: https://github.com/nomis/candle-dribbler/tree/4b5dff82449a19cc4a387f546c0b0c059957466c (0.7.3-1-g4b5dff8) Binary: candle-dribbler.elf.gz Core dump: core-dump-2024-04-19.txt and core-dump-2024-04-19-fixed-crc.txt