Yacubane / esp32-arduino-matter

Matter IoT protocol library for ESP32 working on Arduino framework
Apache License 2.0
298 stars 29 forks source link

Light.ino example is tripping watchdog timeout in ESP32 #33

Closed meta-space closed 11 months ago

meta-space commented 1 year ago

I'm trying to get the library to run on my devices. But I get watchdog errors. For instance running the Light.ino on my M5Stack Core (ESP32) I get:

I (100) wifi:wifi driver task: 3ffdc500, prio:23, stack:6656, core=0
I (993) system_api: Base MAC address is not set
I (998) system_api: read default base MAC address from EFUSE
I (1023) wifi:wifi firmware version: 6567a16
I (1023) wifi:wifi certification version: v7.0
I (1024) wifi:config NVS flash: enabled
I (1024) wifi:config nano formating: disabled
I (1028) wifi:Init data frame dynamic rx buffer num: 32
I (1033) wifi:Init management frame dynamic rx buffer num: 32
I (1038) wifi:Init management short buffer num: 32
I (1043) wifi:Init static tx buffer num: 8
I (1047) wifi:Init static rx buffer size: 2212
I (1051) wifi:Init static rx buffer num: 8
I (1055) wifi:Init dynamic rx buffer num: 32
I (1059) wifi_init: rx ba win: 6
I (1063) wifi_init: tcpip mbox: 32
I (1067) wifi_init: udp mbox: 6
I (1071) wifi_init: tcp mbox: 6
I (1074) wifi_init: tcp tx win: 5744
I (1079) wifi_init: tcp rx win: 5744
I (1083) wifi_init: tcp mss: 1440
I (1087) wifi_init: WiFi IRAM OP enabled
I (1091) wifi_init: WiFi RX IRAM OP enabled
I (1114) chip[DL]: NVS set: chip-counters/reboot-count = 1 (0x1)
I (1125) chip[DL]: NVS set: chip-counters/total-hours = 0 (0x0)
I (1128) chip[DL]: NVS set: chip-config/unique-id = "29CC4B94C13D6C08"
I (1129) chip[DL]: Real time clock set to 946684800 (0100/00/01 00:00:00 UTC)
D (1133) chip[DL]: CHIP event task running
I (1137) chip[DL]: Starting ESP WiFi layer
I (1141) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (1246) wifi:mode : sta (78:21:84:93:f5:d4)
I (1247) wifi:enable tsf
W (1248) wifi:Haven't to connect to a suitable AP now!
I (1248) chip[DL]: Attempting to connect WiFi station interface
I (1254) chip[DL]: WiFi station state change: NotConnected -> Connecting
I (1259) chip[DL]: Done driving station state, nothing else to do...
I (1261) wifi:new:<8,0>, old:<1,0>, ap:<255,255>, sta:<8,0>, prof:9
I (1265) chip[SVR]: Initializing subscription resumption storage...
I (1272) wifi:state: init -> auth (b0)
I (1280) chip[SVR]: Server initializing...
D (1284) chip[FP]: Initializing FabricTable from persistent storage
I (1291) chip[TS]: Last Known Good Time: [unknown]
I (1295) chip[TS]: Setting Last Known Good Time to firmware build time 2023-05-14T19:31:49
I (1299) wifi:state: auth -> assoc (0)
I (1306) chip[DMG]: AccessControl: initializing
I (1311) chip[DMG]: Examples::AccessControlDelegate::Init
I (1316) chip[DMG]: AccessControl: setting
I (1320) chip[DMG]: DefaultAclStorage: initializing
I (1324) chip[DMG]: DefaultAclStorage: 0 entries loaded
D (1329) chip[IN]: UDP::Init bind&listen port=5540
D (1334) chip[IN]: UDP::Init bound to port=5540
D (1338) chip[IN]: UDP::Init bind&listen port=5540
D (1342) chip[IN]: UDP::Init bound to port=5540
D (1347) chip[IN]: BLEBase::Init - setting/overriding transport
I (1352) wifi:D (1352) chip[IN]: TransportMgr initialized
state: assoc -> run (10)
I (1367) chip[ZCL]: Using ZAP configuration...
I (1368) chip[DMG]: AccessControlCluster: initializing
I (1369) chip[ZCL]: 0x3f406b01ep 0 clus 0x0000_0030 attr 0x0000_0000 not supported
D (1376) chip[ZCL]: GeneralDiagnostics: OnDeviceReboot
I (1381) chip[ZCL]: Initiating Admin Commissioning cluster.
E (1386) chip[ZCL]: Duplicate attribute override registration failed
D (1394) chip[IN]: SecureSession[0x3ffc8a98]: Allocated Type:1 LSID:64648
D (1399) chip[SC]: Assigned local session key ID 64648
D (1404) chip[SC]: Waiting for PBKDF param request
I (1408) chip[SVR]: BLE networking available but BLE advertising is not supported
I (1415) chip[DIS]: Updating services using commissioning mode 1
I (1421) chip[DIS]: CHIP minimal mDNS started advertising.
D (1427) chip[DL]: Using wifi MAC for hostname
I (1431) chip[DIS]: Advertise commission parameter vendorID=65521 productID=32768 discriminator=3840/15 cm=1
D (1440) chip[DIS]: Responding with _matterc._udp.local
D (1445) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
D (1451) chip[DIS]: Responding with 78218493F5D4.local
I (1452) wifi:D (1456) chip[DIS]: Responding with 78218493F5D4.local
<ba-add>idx:0 (ifx:0, 9c:3d:cf:47:22:2d), tid:0, ssn:0, winSize:64
D (1462) chip[DIS]: Responding with _V65521._sub._matterc._udp.local
D (1474) chip[DIS]: Responding with _S15._sub._matterc._udp.local
D (1480) chip[DIS]: Responding with _L3840._sub._matterc._udp.local
D (1486) chip[DIS]: Responding with _CM._sub._matterc._udp.local
D (1492) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
I (1494) wifi:I (1498) chip[DIS]: CHIP minimal mDNS configured as 'Commissionable node device'; instance name: E650AA6CEAD16650.
connected with SwitTest2, aid = 7, channel 8, BW20, bssid = 9c:3d:cf:47:22:2d
I (1510) chip[DIS]: mDNS service published: _matterc._udp
I (1517) wifi:security: WPA2-PSK, phy: bgn, rssi: -70
I (1522) chip[IN]: CASE Server enabling CASE session setups
D (1532) chip[IN]: SecureSession[0x3ffc8b50]: Allocated Type:2 LSID:64649
D (1538) chip[SC]: Allocated SecureSession (0x3ffc8b50) - waiting for Sigma1 msg
I (1545) chip[SVR]: Joining Multicast groups
D (1549) chip[ZCL]: Emitting StartUp event
I (1553) chip[SVR]: Server Listening...
I (1555) wifi:pm start, type: 1
D (1558) chip[DMG]: Endpoint 0, Cluster 0x0000_001D update version to 3d2e1e4d

D (1567) chip[DMG]: Endpoint 1, Cluster 0x0000_0003 update version to 4e5479e9
D (1574) chip[DMG]: Endpoint 1, Cluster 0x0000_0004 update version to 4e13b31c
D (1580) chip[ZCL]: ERR: writing group feature map 1
D (1585) chip[DMG]: Endpoint 1, Cluster 0x0000_0005 update version to 900a79a6
I (1592) chip[ZCL]: Endpoint 1 On/off already set to new value
D (1598) chip[DMG]: Endpoint 0, Cluster 0x0000_001D update version to 3d2e1e4e
I (1599) wifi:D (1605) chip[ZCL]: Emitting StartUp event
AP's beacon interval = 204800 us, DTIM period = 2
D (1610) chip[EVL]: LogEvent event number: 0x0000000000000000 priority: 2, endpoint id:  0x0 cluster id: 0x0000_0028 event id: 0x0 Sys timestamp: 0x0000000000000145
D (1628) chip[ZCL]: GeneralDiagnostics: OnDeviceReboot
D (1633) chip[DMG]: Endpoint 0, Cluster 0x0000_0033 update version to e352330e
D (1641) chip[EVL]: LogEvent event number: 0x0000000000000001 priority: 2, endpoint id:  0x0 cluster id: 0x0000_0033 event id: 0x3 Sys timestamp: 0x0000000000000164
I (1655) chip[DL]: WIFI_EVENT_STA_START
E (1658) esp_matter_core: Failed to open the node nvs_namespace
I (1659) chip[DL]: Done driving station state, nothing else to do...
I (1670) chip[DL]: WIFI_EVENT_STA_CONNECTED
I (1674) chip[DL]: WiFi station state change: Connecting -> Connecting_Succeeded
I (1681) chip[DL]: WiFi station state change: Connecting_Succeeded -> Connected
I (1688) chip[DL]: WiFi station interface connected
I (1693) chip[ZCL]: WiFiDiagnosticsDelegate: OnConnectionStatusChanged
D (1699) chip[EVL]: LogEvent event number: 0x0000000000000002 priority: 1, endpoint id:  0x0 cluster id: 0x0000_0036 event id: 0x2 Sys timestamp: 0x000000000000019E
I (1713) chip[DL]: Done driving station state, nothing else to do...
I (1720) chip[DL]: Updating advertising data
[   862][I][main-nussbaum-prod.cpp:108] setup(): STEP 108
I (1729) chip[SVR]: SetupQRCode: [MT:Y.K9042C00KA0648G00]
I (1734) chip[SVR]: Copy/paste the below URL in a browser to see the QR Code:
I (1740) chip[SVR]: https://project-chip.github.io/connectedhomeip/qrcode.html?data=MT%3AY.K9042C00KA0648G00
I (1750) chip[SVR]: Manual pairing code: [34970112332]
[   892][I][main-nussbaum-prod.cpp:112] setup(): DONE
I (2988) chip[DL]: IP_EVENT_GOT_IP6
I (2988) chip[DL]: IPv6 addr available. Ready on WIFI_STA_DEF interface: fe80:0000:0000:0000:7a21:84ff:fe93:f5d4
I (2991) chip[DIS]: Updating services using commissioning mode 1
I (2997) chip[DIS]: CHIP minimal mDNS started advertising.
D (3008) chip[DL]: Using wifi MAC for hostname
I (3009) chip[DIS]: Advertise commission parameter vendorID=65521 productID=32768 discriminator=3840/15 cm=1
D (3015) chip[DIS]: Responding with _matterc._udp.local
D (3020) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
D (3026) chip[DIS]: Responding with 78218493F5D4.local
D (3031) chip[DIS]: Responding with 78218493F5D4.local
D (3036) chip[DIS]: Responding with _V65521._sub._matterc._udp.local
D (3042) chip[DIS]: Responding with _S15._sub._matterc._udp.local
D (3048) chip[DIS]: Responding with _L3840._sub._matterc._udp.local
D (3054) chip[DIS]: Responding with _CM._sub._matterc._udp.local
D (3060) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
I (3066) chip[DIS]: CHIP minimal mDNS configured as 'Commissionable node device'; instance name: E650AA6CEAD16650.
I (3080) chip[DIS]: mDNS service published: _matterc._udp
E (3081) chip[SVR]: Server initialization complete
I (3086) chip[DIS]: Updating services using commissioning mode 1
I (3092) chip[DIS]: CHIP minimal mDNS started advertising.
D (3100) chip[DIS]: Warning: Attempt to mDNS broadcast failed on st1:  Error LwIP:0x03000001
I (3105) chip[DIS]: mDNS broadcast full failed in 1 separate send attempts.
E (3112) chip[DIS]: Failed to advertise records: 3000001
D (3120) chip[DIS]: Warning: Attempt to mDNS broadcast failed on st1:  Error LwIP:0x03000001
I (3125) chip[DIS]: mDNS broadcast full failed in 1 separate send attempts.
E (3131) chip[DIS]: Failed to advertise records: 3000001
D (3137) chip[DL]: Using wifi MAC for hostname
I (3141) chip[DIS]: Advertise commission parameter vendorID=65521 productID=32768 discriminator=3840/15 cm=1
D (3150) chip[DIS]: Responding with _matterc._udp.local
D (3155) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
D (3161) chip[DIS]: Responding with 78218493F5D4.local
D (3166) chip[DIS]: Responding with 78218493F5D4.local
D (3171) chip[DIS]: Responding with _V65521._sub._matterc._udp.local
D (3177) chip[DIS]: Responding with _S15._sub._matterc._udp.local
D (3183) chip[DIS]: Responding with _L3840._sub._matterc._udp.local
D (3189) chip[DIS]: Responding with _CM._sub._matterc._udp.local
D (3195) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
I (3201) chip[DIS]: CHIP minimal mDNS configured as 'Commissionable node device'; instance name: E650AA6CEAD16650.
D (3215) chip[DIS]: Warning: Attempt to mDNS broadcast failed on st1:  Error LwIP:0x03000001
I (3220) chip[DIS]: mDNS broadcast full failed in 1 separate send attempts.
E (3226) chip[DIS]: Failed to advertise records: 3000001
I (3231) chip[DIS]: mDNS service published: _matterc._udp
I (3241) chip[IM]: No subscriptions to resume
I (3569) esp_netif_handlers: sta ip: 10.0.0.196, mask: 255.255.255.0, gw: 10.0.0.1
I (3570) chip[DL]: IP_EVENT_STA_GOT_IP
I (3571) chip[DL]: IPv4 address changed on WiFi station interface: 10.0.0.196/255.255.255.0 gateway 10.0.0.1
I (3580) chip[DL]: IPv4 Internet connectivity ESTABLISHED
I (3585) chip[DIS]: Updating services using commissioning mode 1
I (3592) chip[DIS]: CHIP minimal mDNS started advertising.
D (3609) chip[DIS]: Warning: Attempt to mDNS broadcast failed on st1:  Error LwIP:0x03000001
I (3610) chip[DIS]: mDNS broadcast full failed in 1 separate send attempts.
E (3614) chip[DIS]: Failed to advertise records: 3000001
D (3619) chip[DL]: Using wifi MAC for hostname
I (3623) chip[DIS]: Advertise commission parameter vendorID=65521 productID=32768 discriminator=3840/15 cm=1
D (3633) chip[DIS]: Responding with _matterc._udp.local
D (3637) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
D (3644) chip[DIS]: Responding with 78218493F5D4.local
D (3649) chip[DIS]: Responding with 78218493F5D4.local
D (3653) chip[DIS]: Responding with _V65521._sub._matterc._udp.local
D (3660) chip[DIS]: Responding with _S15._sub._matterc._udp.local
D (3665) chip[DIS]: Responding with _L3840._sub._matterc._udp.local
D (3671) chip[DIS]: Responding with _CM._sub._matterc._udp.local
D (3677) chip[DIS]: Responding with E650AA6CEAD16650._matterc._udp.local
I (3683) chip[DIS]: CHIP minimal mDNS configured as 'Commissionable node device'; instance name: E650AA6CEAD16650.
D (3700) chip[DIS]: Warning: Attempt to mDNS broadcast failed on st1:  Error LwIP:0x03000001
I (3702) chip[DIS]: mDNS broadcast full failed in 1 separate send attempts.
E (3708) chip[DIS]: Failed to advertise records: 3000001
I (3713) chip[DIS]: mDNS service published: _matterc._udp
I (5238) chip[SWU]: Stopping the watchdog timer
I (5239) chip[SWU]: Starting the periodic query timer, timeout: 86400 seconds
D (5240) chip[DMG]: Endpoint 0, Cluster 0x0000_002A update version to 335b1238
D (5246) chip[DMG]: Endpoint 0, Cluster 0x0000_002A update version to 335b1239
E (6723) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (6723) task_wdt:  - IDLE (CPU 1)
E (6723) task_wdt: Tasks currently running:
E (6723) task_wdt: CPU 0: IDLE
E (6723) task_wdt: CPU 1: loopTask
E (6723) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4011D1E2:0x3FFBF880 0x4008384D:0x3FFBF8A0 0x401D9AA3:0x3FFBC600 0x4011DA2E:0x3FFBC620 0x400935AD:0x3FFBC640 0x400952B9:0x3FFBC660

  #0  0x4011D1E2:0x3FFBF880 in task_wdt_isr at C:\Users\fda\.platformio\packages\framework-espidf\components\esp_system/task_wdt.c:183 (discriminator 3)
  #1  0x4008384D:0x3FFBF8A0 in _xt_lowint1 at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos\port\xtensa/xtensa_vectors.S:1114
  #2  0x401D9AA3:0x3FFBC600 in cpu_ll_waiti at C:\Users\fda\.platformio\packages\framework-espidf\components\hal\esp32\include/hal/cpu_ll.h:183
      (inlined by) esp_pm_impl_waiti at C:\Users\fda\.platformio\packages\framework-espidf\components\esp_pm/pm_impl.c:837
  #3  0x4011DA2E:0x3FFBC620 in esp_vApplicationIdleHook at C:\Users\fda\.platformio\packages\framework-espidf\components\esp_system/freertos_hooks.c:63
  #4  0x400935AD:0x3FFBC640 in prvIdleTask at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos/tasks.c:3987 (discriminator 1)
  #5  0x400952B9:0x3FFBC660 in vPortTaskWrapper at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos\port\xtensa/port.c:142

E (6723) task_wdt: Print CPU 1 backtrace

Backtrace: 0x40082C6D:0x3FFBFE80 0x4008384D:0x3FFBFEA0 0x4000BFED:0x3FFD6090 0x40094419:0x3FFD60A0 0x40092DB1:0x3FFD60C0 0x40113508:0x3FFD6100 0x40113C25:0x3FFD6130 0x40113CD1:0x3FFD6150 0x401143B2:0x3FFD6170 0x400952B9:0x3FFD6190

  #0  0x40082C6D:0x3FFBFE80 in esp_crosscore_isr at C:\Users\fda\.platformio\packages\framework-espidf\components\esp_system/crosscore_int.c:92
  #1  0x4008384D:0x3FFBFEA0 in _xt_lowint1 at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos\port\xtensa/xtensa_vectors.S:1114
  #2  0x4000BFED:0x3FFD6090 in ?? ??:0
  #3  0x40094419:0x3FFD60A0 in vPortClearInterruptMaskFromISR at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos\port\xtensa\include/freertos/portmacro.h:571
      (inlined by) xTaskGetSchedulerState at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos/tasks.c:4699
  #4  0x40092DB1:0x3FFD60C0 in xQueueSemaphoreTake at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos/queue.c:1554
  #5  0x40113508:0x3FFD6100 in uartAvailable at C:/Users/fda/.platformio/packages/framework-arduinoespressif32/cores/esp32/esp32-hal-uart.c:317 (discriminator 1)
  #6  0x40113C25:0x3FFD6130 in HardwareSerial::available() at C:/Users/fda/.platformio/packages/framework-arduinoespressif32/cores/esp32/HardwareSerial.cpp:476
  #7  0x40113CD1:0x3FFD6150 in serialEventRun() at C:/Users/fda/.platformio/packages/framework-arduinoespressif32/cores/esp32/HardwareSerial.cpp:116
  #8  0x401143B2:0x3FFD6170 in loopTask(void*) at C:/Users/fda/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:51 (discriminator 1)
  #9  0x400952B9:0x3FFD6190 in vPortTaskWrapper at C:\Users\fda\.platformio\packages\framework-espidf\components\freertos\port\xtensa/port.c:142

E (11723) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (11723) task_wdt:  - IDLE (CPU 1)
E (11723) task_wdt: Tasks currently running:
E (11723) task_wdt: CPU 0: IDLE
E (11723) task_wdt: CPU 1: loopTask
E (11723) task_wdt: Print CPU 0 (current core) backtrace

If I globally disable the watchdogs using esp_task_wdt_init(-1, false) the device keeps running and I'm able to commission the light successfully and control it. Of course I cannot do this in production. I'm using PlatformIO and believe that I have fulfilled all requirements mentioned in the readme.md

meta-space commented 1 year ago

I think I found the culprit.

I am using a custom espressif32-arduino build. It turns out that I had the option CONFIG_ESP_TASK_WDT_CHECK_IDLE_TASK_CPU1=y in my sdkconfig. This will result in the watchdog being tripped when running Matter stuff. The default esp32-arduino sdkconfig doesn't enable this setting.

I feel a little uneasy disabling the watchdog on the CPU1 Idle task... For the moment I'm going to assume that it is fine, as it is not enabled in the esp32-arduino-framework.

On the other hand the Arduino startup shows that the loopTask first invokes setup() and then loop(). If loop() blocks for some reason I'd really like to know about it :) In the Light.ino it seems that the setup() function is blocking if the device hasn't been commissioned.

void loopTask(void *pvParameters)
{
    setup();
    for(;;) {
#if CONFIG_FREERTOS_UNICORE
        yieldIfNecessary();
#endif
        if(loopTaskWDTEnabled){
            esp_task_wdt_reset();
        }
        loop();
        if (serialEventRun) serialEventRun();
    }
}

I also discovered interesting functions in esp32-hal-misc.c which enable and disable the watchdogs at runtime. Maybe that's a better solution

Yacubane commented 11 months ago

Hey! Thanks for sharing interesting info. In my private home project I also wanted more flexibility and created custom headers for handling watchdogs:

watchdogs.h

#pragma once
#include <esp_task_wdt.h>

class WatchdogsClass
{
public:
    WatchdogsClass() {}
    void setupWatchdog(uint32_t timeout) {
        esp_task_wdt_init(timeout, true);
        esp_task_wdt_add(NULL);
    }

    void restartWatchdog() {
        esp_task_wdt_reset();
    }
private:
};

extern WatchdogsClass Watchdogs;

watchdogs.cpp

#include "watchdogs.h"

WatchdogsClass Watchdogs;

Then you can use it inside your code

Watchdogs.setupWatchdog(30); // setup watchdogs for 30 seconds
Watchdogs.restartWatchdog(); // yield (restart watchdog counter)