espressif / esp-matter

Espressif's SDK for Matter
Apache License 2.0
707 stars 161 forks source link

SRP update error via OpenThread (CON-1307) #1056

Open hatl opened 3 months ago

hatl commented 3 months ago

Using the existing examples, I wrote some basic code to expose a switch (onoff plugin) and buttons - see: https://github.com/hatl/esp-matter-code When commissioning the device via WiFi, everything works as expected and the device shows up in Home Assistant. Then, I switched to OpenThread - according to https://docs.espressif.com/projects/esp-matter/en/latest/esp32c6/developing.html#flashing-the-firmware

To enable Thread, you should change the menuconfig options to CONFIG_OPENTHREAD_ENABLED=y, CONFIG_ENABLE_WIFI_STATION=n, and CONFIG_USE_MINIMAL_MDNS=n.

When using Thread, the error E (34252) chip[DL]: SRP update error: internal server error as mentioned below keeps repeating.

Environment

Any additional details

Log for OpenThread commissioning:

I (1472) chip[ZCL]: Initiating Admin Commissioning cluster.
I (1472) chip[DIS]: Updating services using commissioning mode 1
E (1482) chip[DIS]: Failed to remove advertised services: 3
I (1492) chip[DIS]: Advertise commission parameter vendorID=65522 productID=32769 discriminator=2026/07 cm=1 cp=0
E (1502) chip[DIS]: Failed to advertise commissionable node: 3
E (1502) chip[DIS]: Failed to finalize service update: 3
I (1512) chip[IN]: CASE Server enabling CASE session setups
I (1512) chip[SVR]: Joining Multicast groups
I (1522) chip[SVR]: Server Listening...```
I (1532) esp_matter_core: Dynamic endpoint 0 added
I (1532) esp_matter_attribute: ********** W : Endpoint 0x0001's Cluster 0x00000003's Attribute 0x00000001 is 0 **********
I (1542) esp_matter_core: Dynamic endpoint 1 added
I (1552) esp_matter_attribute: ********** W : Endpoint 0x0002's Cluster 0x00000003's Attribute 0x00000001 is 0 **********
I (1562) esp_matter_core: Dynamic endpoint 2 added
I (1562) esp_matter_attribute: ********** W : Endpoint 0x0003's Cluster 0x00000003's Attribute 0x00000001 is 0 **********
I (1572) esp_matter_attribute: ********** W : Endpoint 0x0003's Cluster 0x00000004's Attribute 0x00000000 is 128 **********
I (1592) esp_matter_attribute: ********** W : Endpoint 0x0003's Cluster 0x00000004's Attribute 0x0000FFFC is <invalid type: 0> **********
I (1602) chip[ZCL]: 0x42156e68 ep 3 clus 0x0000_0062 attr 0x0000_0000 not supported
I (1612) esp_matter_attribute: ********** R : Endpoint 0x0003's Cluster 0x00000006's Attribute 0x0000FFFC is 1 **********
I (1622) esp_matter_attribute: ********** R : Endpoint 0x0003's Cluster 0x00000006's Attribute 0x00004003 is 1 **********
I (1632) esp_matter_attribute: ********** R : Endpoint 0x0003's Cluster 0x00000006's Attribute 0x00000000 is 1 **********
I (1642) esp_matter_attribute: ********** R : Endpoint 0x0003's Cluster 0x00000006's Attribute 0x00000000 is 1 **********
I (1652) chip[ZCL]: Endpoint 3 On/off already set to new value
I (1662) esp_matter_core: Dynamic endpoint 3 added
I (1672) esp_matter_core: Cannot find minimum unused endpoint_id, try to find in the previous namespace
I (1682) esp_matter_core: Failed to open node namespace
I (1672) chip[DL]: Configuring CHIPoBLE advertising (interval 25 ms, connectable)
I (1702) NimBLE: GAP procedure initiated: advertise; 
I (1712) NimBLE: disc_mode=2
I (1712) NimBLE:  adv_channel_map=0 own_addr_type=1 adv_filter_policy=0 adv_itvl_min=40 adv_itvl_max=40
I (1722) NimBLE: 

I (1732) chip[DL]: CHIPoBLE advertising started
I (1732) app_main: Commissioning window opened

Repeating errror:

I(31152) OPENTHREAD:[I] SrpClient-----: Received response
I(31172) OPENTHREAD:[I] SrpClient-----: Server rejected Failed code:2
I(31172) OPENTHREAD:[I] SrpClient-----: HostInfo Adding -> ToAdd
I(31182) OPENTHREAD:[I] SrpClient-----: Retry interval 3060 ms
I(31182) OPENTHREAD:[I] SrpClient-----: State Updating -> ToRetry
E (31192) chip[DL]: SRP update error: internal server error
I(33582) OPENTHREAD:[I] Settings------: Read SrpEcdsaKey
I(33602) OPENTHREAD:[I] SrpClient-----: Send update
I(33602) OPENTHREAD:[I] SrpClient-----: HostInfo ToAdd -> Adding
I(33602) OPENTHREAD:[I] SrpClient-----: State ToRetry -> Updating
I(33652) OPENTHREAD:[I] Mac-----------: Frame tx attempt 1/16 failed, error:NoAck, len:122, seqnum:252, type:Data, src:0x2c32, dst:0x2c00, sec:yes, ackreq:yes
I(33712) OPENTHREAD:[I] Mac-----------: Frame tx attempt 1/16 failed, error:NoAck, len:122, seqnum:253, type:Data, src:0x2c32, dst:0x2c00, sec:yes, ackreq:yes
I(33742) OPENTHREAD:[I] Mac-----------: Frame tx attempt 2/16 failed, error:NoAck, len:122, seqnum:253, type:Data, src:0x2c32, dst:0x2c00, sec:yes, ackreq:yes
I(33802) OPENTHREAD:[I] Mac-----------: Frame tx attempt 1/16 failed, error:NoAck, len:122, seqnum:254, type:Data, src:0x2c32, dst:0x2c00, sec:yes, ackreq:yes
I(33892) OPENTHREAD:[I] MeshForwarder-: Sent IPv6 UDP msg, len:726, chksum:f0d9, ecn:no, to:0x2c00, sec:yes, prio:normal
I(33892) OPENTHREAD:[I] MeshForwarder-:     src:[...]:49154
I(33902) OPENTHREAD:[I] MeshForwarder-:     dst:[...]:53538
I(34192) OPENTHREAD:[I] MeshForwarder-: Received IPv6 UDP msg, len:60, chksum:fd7c, ecn:no, from:0x2c00, sec:yes, prio:normal, rss:-78.0
I(34202) OPENTHREAD:[I] MeshForwarder-:     src:[...]:53538
I(34202) OPENTHREAD:[I] MeshForwarder-:     dst:[...]:49154
I(34222) OPENTHREAD:[I] SrpClient-----: Received response
I(34232) OPENTHREAD:[I] SrpClient-----: Server rejected Failed code:2
I(34232) OPENTHREAD:[I] SrpClient-----: HostInfo Adding -> ToAdd
I(34242) OPENTHREAD:[I] SrpClient-----: Retry interval 5 sec
I(34242) OPENTHREAD:[I] SrpClient-----: State Updating -> ToRetry
E (34252) chip[DL]: SRP update error: internal server error

During WiFi commissioning, I get some errors, but it works:

E (25150) chip[DMG]: Error retrieving data from clusterId: 0x0000_0028, err = b
E (25260) chip[DMG]: Error retrieving data from clusterId: 0x0000_003C, err = b
E (25370) chip[DMG]: Error retrieving data from clusterId: 0x0000_003E, err = b
E (25500) chip[DMG]: Error retrieving data from clusterId: 0x0000_0036, err = b
E (25740) chip[DMG]: Error retrieving data from clusterId: 0x0000_0040, err = b
E (25960) chip[DMG]: Error retrieving data from clusterId: 0x0000_0040, err = b
E (26230) chip[DMG]: Error retrieving data from clusterId: 0x0000_0006, err = b
E (26930) chip[DMG]: Error retrieving data from clusterId: 0x0000_0028, err = b
E (27030) chip[DMG]: Error retrieving data from clusterId: 0x0000_0033, err = b
E (27120) chip[DMG]: Error retrieving data from clusterId: 0x0000_003E, err = b
E (27240) chip[DMG]: Error retrieving data from clusterId: 0x0000_0036, err = b
E (27480) chip[DMG]: Error retrieving data from clusterId: 0x0000_0040, err = b
E (27690) chip[DMG]: Error retrieving data from clusterId: 0x0000_0040, err = b
E (27960) chip[DMG]: Error retrieving data from clusterId: 0x0000_0006, err = b
hatl commented 3 months ago

It's strange. On very rare occasions, the pairing works...

hatl commented 3 months ago

I tried to run the same code on an ESP32-H2. Unfortunately, this also fails.

I'm also getting the following errors:

I (1541) chip[ZCL]: Initiating Admin Commissioning cluster. 
I (1551) chip[DIS]: Updating services using commissioning mode 1
E (1551) chip[DIS]: Failed to remove advertised services: 3 
I (1561) chip[DIS]: Advertise commission parameter vendorID=65521 productID=32768 discriminator=0164/00 cm=1 cp=0
E (1571) chip[DIS]: Failed to advertise commissionable node: 3
E (1571) chip[DIS]: Failed to finalize service update: 3
...
I (94920) chip[SVR]: Commissioning completed successfully
...
E (98700) chip[DL]: SRP update error: internal server error
hatl commented 3 months ago

i could narrow it down further: the commissioning works when using the test device attestation. when i switch the config to factory attestation, i get the errors as described.

i've changed the following parameters:

CONFIG_ENABLE_ESP32_FACTORY_DATA_PROVIDER=y
CONFIG_ENABLE_ESP32_DEVICE_INSTANCE_INFO_PROVIDER=y
CONFIG_SEC_CERT_DAC_PROVIDER=y
CONFIG_FACTORY_DEVICE_INSTANCE_INFO_PROVIDER=y

i generate the factory and cert images using the following command:

esp-matter-mfg-tool -cn "My OnOff Plug" -v 0xFFF2 -p 0x8001 --pai \
--vendor-name "vendor" --product-name "My OnOff Plug" --hw-ver 1 --hw-ver-str v1.0 --mfg-date 2024-08-09 \
-k $MATTER_SDK_PATH/credentials/test/attestation/Chip-Test-PAI-FFF2-8001-Key.pem \
-c $MATTER_SDK_PATH/credentials/test/attestation/Chip-Test-PAI-FFF2-8001-Cert.pem \
-cd $MATTER_SDK_PATH/credentials/test/certification-declaration/Chip-Test-CD-FFF2-8001.der \
--dac-in-secure-cert

flashed using:

esptool.py -p /dev/ttyACM0 write_flash 0x10000 out/fff2_8001/37d58392-35db-4ef8-a8ac-9e9576c982f0/*-partition.bin
esptool.py -p /dev/ttyACM0 write_flash 0xd000 out/fff2_8001/37d58392-35db-4ef8-a8ac-9e9576c982f0/*_esp_secure_cert.bin
hatl commented 3 months ago

Thanks to the support of @trailblazing-tech I could narrow it down further. The issue only occurs when using an ESP Thread Border Router. It works fine, when using a ZBT-1 (SkyConnect) USB dongle. Currently, I'm running v1.0. I'll continue testing using the latest commit in the main branch.

log.txt

wqx6 commented 3 months ago

Could you tell me the commit of ESP-IDF you are using for Thread BR? Looks like it is the issue of Thread BR? Are you using the esp-thread-br repo https://github.com/espressif/esp-thread-br? Or you are just using the example from ESP-IDF?

hatl commented 3 months ago

I tried v1.0 in the repo: https://github.com/espressif/esp-thread-br.git and idf v5.2.2

But I also tried using a more recent version: 6011f07ca9bd31686944b33422d86c1e4d429479 together with idf v5.3.0

wqx6 commented 2 months ago

Could you verify that the dataset on the end-device is the same as the dataset of the Thread BR? You can get the dataset of the end-device by input matter esp ot_cli dataset active -x in the end-device's console.

dhrishi commented 2 months ago

@hatl Is there any update you can share?

hatl commented 2 months ago

sorry for the delayed reply. I've double-checked that the data set on Home Assistant and on the border router are the same. I've also synced the data set to my phone before pairing. For the rare cases when the pairing works, the dataset is the same.

chshu commented 2 months ago

@hatl If the issue only occurs with custom factory and secure_cert partition, it's not related to the Thread BR, but seems some issue related to the commissioning process.

To compare the difference between test, could you try to commision the device using chip-tool:

https://docs.espressif.com/projects/esp-matter/en/latest/esp32h2/developing.html#test-setup-chip-tool,

And share us the logs from chip-tool, Thread BR, and Matter device.

hatl commented 2 months ago

as mentioned in https://github.com/espressif/esp-thread-br/issues/90 works with the ZBT1 stick, does not work with the esp-thread-br

chshu commented 2 months ago

But as you mentiond, the esp-thread-br works if you use the test DAC. It only fails if you use the custom factory and secure_cert partition? We don’t see any reason why the custom factory partition would cause an issue with the Thread BR.

Could you share more logs as requried in https://github.com/espressif/esp-matter/issues/1056#issuecomment-2381078422.

hatl commented 2 months ago

that's the strange thing: the test DAC works on both, the ZBT1 and the Thread BR but the custom factory partition only causes issues with the Thread BR, not with the ZBT1 I'll generate more logs - this will take some time since i'll be offline for the next week

hatl commented 4 weeks ago

here are the log files.

device.log router.log