Closed aggaddam closed 11 months ago
I got few more logs (Device side and Chip side logs) for the device crash during network pairing. Attached logs here for both device side and chiptool side
Chip Tool Log: chip_log1.txt
Esp32C3 Device Log: esp32c3_device_crash1.txt
Can you use latest esp-matter commit 68e99de70266674ce172459136960bbb5c58dc0e ? I dont see the issue there
Esp32C3 Device Log: esp32c3_device_crash1.txt
@aggaddam From these logs it looks like the crash is due to Matter resources being accessed from a non-Matter thread For more details, you can check: https://docs.espressif.com/projects/esp-matter/en/latest/esp32/faq.html#a1-11-chip-stack-locking-error-code-is-unsafe-racy
It is possible that your code is calling a matter API from a non-matter thread, Can you send a backtrace of your code before the crash happens
Can you use latest esp-matter commit 68e99de ? I dont see the issue there
I am seeing a few warnings/errors when I run the matter example code for the Room Air Conditioner device type with the latest ESP-Matter SDK (https://github.com/espressif/esp-matter/commit/68e99de70266674ce172459136960bbb5c58dc0e) that you pointed out. Do you know if this is expected behavior?
Here are the errors/warnings
Esp32C3 Device Log: esp32c3_device_crash1.txt
@aggaddam From these logs it looks like the crash is due to Matter resources being accessed from a non-Matter thread For more details, you can check: https://docs.espressif.com/projects/esp-matter/en/latest/esp32/faq.html#a1-11-chip-stack-locking-error-code-is-unsafe-racy
It is possible that your code is calling a matter API from a non-matter thread, Can you send a backtrace of your code before the crash happens
I am calling matter API's with lock protection. Check this pseudo code
void matter_helperfcn_stop_commissioning_window(void)
{
// Close the commission window
lock::status_t lock_status = lock::chip_stack_lock(portMAX_DELAY);
CommissioningWindowManager & commissionMgr = Server::GetInstance().GetCommissioningWindowManager();
commissionMgr.CloseCommissioningWindow();
if (lock::SUCCESS == lock_status)
{
lock::chip_stack_unlock();
}
}
Even without calling CHIP SDK APIs also, I am seeing random device hangs with the latest ESP-matter SDK and watch dog timer resets the system
Here is the crash and back trace
D (30434) chip[DMG]: Command handler moving to [CommandSen]
D (30434) chip[DMG]: Command handler moving to [AwaitingDe]
I (30514) chip[DL]: Confirm received for CHIPoBLE TX characteristic indication (con 1) status= 14
I (30514) CHIP[DL]: Write request received for CHIPoBLE RX characteristic con 1 12
I (30524) chip[EM]: >>> [E:13530r S:61336 M:83290419] (S) Msg RX from 1:FFFFFFFB00000000 [9B28] --- Type 0001:08 (IM:InvokeCommandRequest)
D (30534) chip[EM]: Handling via exchange: 13530r, Delegate: 0x3fc9560c
D (30534) chip[DMG]: InvokeRequestMessage =
D (30554) chip[DMG]: {
D (30554) chip[DMG]: suppressResponse = false,
D (30554) chip[DMG]: timedRequest = false,
D (30564) chip[DMG]: InvokeRequests =
D (30564) chip[DMG]: [
D (30564) chip[DMG]: CommandDataIB =
D (30564) chip[DMG]: {
D (30574) chip[DMG]: CommandPathIB =
D (30574) chip[DMG]: {
D (30574) chip[DMG]: EndpointId = 0x0,
D (30584) chip[DMG]: ClusterId = 0x30,
D (30584) chip[DMG]: CommandId = 0x0,
D (30584) chip[DMG]: },
D (30584) chip[DMG]:
D (30594) chip[DMG]: CommandFields =
D (30594) chip[DMG]: {
D (30604) chip[DMG]: 0x0 = 66,
D (30604) chip[DMG]: 0x1 = 22,
D (30614) chip[DMG]: },
D (30614) chip[DMG]: },
D (30614) chip[DMG]:
D (30614) chip[DMG]: ],
D (30624) chip[DMG]:
D (30624) chip[DMG]: InteractionModelRevision = 1
D (30624) chip[DMG]: },
D (30624) chip[DMG]: Received command for Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0000
I (30634) esp_matter_command: Received command 0x00000000 for endpoint 0x0000's cluster 0x00000030
I (30644) chip[FS]: GeneralCommissioning: Received ArmFailSafe (66s)
I (30654) esp_matter_attribute: ********** W : Endpoint 0x0000's Cluster 0x00000030's Attribute 0x00000000 is 22 **********
W (30674) MATT_APP: attribute_update_cb: 0, 48, 0
D (30674) chip[DMG]: Command handler moving to [ Preparing]
D (30674) chip[DMG]: Command handler moving to [AddingComm]
D (30684) chip[DMG]: Command handler moving to [AddedComma]
D (30684) chip[DMG]: Decreasing reference count for CommandHandler, remaining 0
I (30694) chip[EM]: <<< [E:13530r S:61336 M:57366872] (S) Msg TX to 1:FFFFFFFB00000000 [9B28] --- Type 0001:09 (IM:InvokeCommandResponse)
I (30714) chip[IN]: (S) Sending msg 57366872 on secure session with LSID: 61336
D (30724) CHIP[DL]: Sending indication for CHIPoBLE TX characteristic (con 1, len 71)
I (30724) NimBLE: GATT procedure initiated: indicate;
I (30734) NimBLE: att_handle=14
D (30734) chip[DMG]: Command handler moving to [CommandSen]
D (30734) chip[DMG]: Command handler moving to [AwaitingDe]
I (30784) chip[DL]: Confirm received for CHIPoBLE TX characteristic indication (con 1) status= 14
I (30784) CHIP[DL]: Write request received for CHIPoBLE RX characteristic con 1 12
I (30794) chip[EM]: >>> [E:13531r S:61336 M:83290420] (S) Msg RX from 1:FFFFFFFB00000000 [9B28] --- Type 0001:08 (IM:InvokeCommandRequest)
D (30804) chip[EM]: Handling via exchange: 13531r, Delegate: 0x3fc9560c
D (30804) chip[DMG]: InvokeRequestMessage =
D (30824) chip[DMG]: {
D (30824) chip[DMG]: suppressResponse = false,
D (30824) chip[DMG]: timedRequest = false,
D (30834) chip[DMG]: InvokeRequests =
D (30834) chip[DMG]: [
D (30834) chip[DMG]: CommandDataIB =
D (30834) chip[DMG]: {
D (30844) chip[DMG]: CommandPathIB =
D (30844) chip[DMG]: {
D (30844) chip[DMG]: EndpointId = 0x0,
E (30854) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (30854) task_wdt: - IDLE (CPU 0)
E (30854) task_wdt: Tasks currently running:
E (30854) task_wdt: CPU 0: tiT
E (30854) task_wdt: Aborting.
Core 2 register dump:
MEPC : 0x4038bf5a RA : 0x4038b386 SP : 0x3fcb1bd0 GP : 0x3fc92000
TP : 0x3fc6d704 T0 : 0x4038b376 T1 : 0xfffffff0 T2 : 0x00000060
S0/FP : 0x00000044 S1 : 0x00000000 A0 : 0x3fcb0df8 A1 : 0x3fcb1cc8
A2 : 0x00000007 A3 : 0x00000007 A4 : 0x3fcb1cc8 A5 : 0x3fcb1cc8
A6 : 0x00000001 A7 : 0x00000000 S2 : 0x3fcb1c5c S3 : 0xffffffff
S4 : 0x00000000 S5 : 0x00000003 S6 : 0x00000004 S7 : 0x00000000
S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
T3 : 0x01000000 T4 : 0x00000000 T5 : 0x00000000 T6 : 0x00000000
MSTATUS : 0x3fcb1c5c MTVEC : 0x00000000 MCAUSE : 0x3fcb0dd4 MTVAL : 0x4038a854
MHARTID : 0x420800bc
Stack memory:
3fcb1bd0: 0x3fcb1c5c 0x00000000 0x3fcb0dd4 0x4038a854 0x420800bc 0x3c163984 0x3fcbcb00 0x00000044
3fcb1bf0: 0x420800bc 0x3c163984 0x00000000 0x00000bb8 0x00000004 0x00000003 0x00000000 0x00000002
3fcb1c10: 0x3fca4000 0xffffffff 0x3fcbcb00 0x4208865a 0x3fca4000 0x420800bc 0x000073f0 0x3fcb1c5c
3fcb1c30: 0x3fca4000 0xffffffff 0x3fca4000 0x42080296 0x3fca4000 0xffffffff 0x3fcbcb00 0x42078f0e
3fcb1c50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1c70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1c90: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x00000154
3fcb1cb0: 0x3fcb1b50 0x00000bfc 0x3fc94830 0x3fc94830 0x3fcb1cb0 0x3fc94828 0x00000007 0x3fcb1cc8
3fcb1cd0: 0x3fcb1cc8 0x3fcb1cb0 0x3fcb0df8 0x00000012 0x3fcb0eac 0x00546974 0xa5a5a5a5 0xa5a5a5a5
3fcb1cf0: 0x0015bfec 0x00000000 0x3fcb1ca0 0x00000012 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1d10: 0x3fca62dc 0x3fca6344 0x3fca63ac 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000
3fcb1d30: 0x00000000 0x42124e88 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1d50: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1d70: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1d90: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1db0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1dd0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1df0: 0x00000000 0x00000000 0x00000000 0x00000000 0x3f000000 0x00000054 0x3fcb1e08 0x3fcb1e08
3fcb1e10: 0x3fcb1e08 0x3fcb1e08 0x00000000 0x3fcb1e20 0xffffffff 0x3fcb1e20 0x3fcb1e20 0x00000000
3fcb1e30: 0x3fcb1e34 0xffffffff 0x3fcb1e34 0x3fcb1e34 0x00000001 0x00000001 0x00000000 0x0000ffff
3fcb1e50: 0x00000000 0xb33fffff 0x00000000 0x00000060 0x153bdaec 0x0000e806 0x3fcb1ec4 0x3fcb1ed4
3fcb1e70: 0x3fcb1ee4 0x420d7c52 0x420d7ba8 0x3fcb1ee4 0x00000000 0x00000000 0x3fcb2070 0x4208e506
3fcb1e90: 0x4208e4fc 0x4208e4f2 0x00000000 0x00000000 0x00000000 0x00000000 0x00000001 0x00000099
3fcb1eb0: 0x00000000 0x3fcb204c 0x3fcb2060 0x00000064 0x0000000c 0x00000000 0x00000000 0x00000000
3fcb1ed0: 0x0000000c 0x00000000 0x00000000 0x00000000 0x000000fc 0x3fc9e300 0x00000000 0x00000000
3fcb1ef0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1f10: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1f30: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000006 0x00000000 0x00000000
3fcb1f50: 0x00000000 0x00000000 0x00000000 0x00000006 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1f70: 0x00000000 0x00000006 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcb1f90: 0x00000000 0x42078fa4 0x42082d82 0x420d7b28 0x420e51e4 0x3fcb1e60 0x3fccc678 0x3fcbcb14
3fcb1fb0: 0x00000000 0x3c172694 0x05dc05dc 0x153bdaec 0x6a06e806 0x01017473 0x00000003 0x00000000
BackTrace:
riscv32-esp-elf-addr2line -pfiaC -e build/ac.elf 0x4038bf5a
0x4038bf5a: vListInsert at li_esp32matter_sdk/esp-idf/components/freertos/FreeRTOS-Kernel/list.c:171 (discriminator 1)
@aggaddam can you share a patch with your changes? I am not able to reproduce this error
@PSONALl This issue is more frequently observed when log level is set to debug instead of info. Please try with the log level set to debug and then try matter commissioning for at least 5-10 times.
This issue is also observed with log level info but the occurrence is less compared to debug log level
@aggaddam I tried to reproduce this issue by setting log level to debug and tried commissioning more than 10 times but I am not able to reproduce the error Please check if your changes are correct once
@PSONALl Somehow, i am unable to reproduce the issue with examples code. I am trying to reproduce the issue by adding some application changes to the example code. Will share the patch If i am able to reproduce the issue
I will close this now. Please re-open with the requested details if you encounter the issue
Describe the bug CHIP SDK Hangs randomly and the device reboots due to watchdog timer timeout. The watchdog timeout period is set to 30 seconds (Which means the idle task didn't get a chance to execute itself within 30 seconds). Suspecting CHIP SDK is blocking CPU time and not releasing it back.
Environment :
Any additional details ...