espressif / esp-zigbee-sdk

Espressif Zigbee SDK
Apache License 2.0
163 stars 26 forks source link

New version of esp-zboss-lib library don't recognize the transport key link message (TZ-421) #154

Closed nalves23 closed 11 months ago

nalves23 commented 11 months ago

Answers checklist.

IDF version.

ESP-IDF 5.1.2

esp-zigbee-lib version.

1.0.3

esp-zboss-lib version.

1.0.3 or 1.0.4

Espressif SoC revision.

ESP32-H2

What is the expected behavior?

The esp-zboss-lib library update from 1.0.3 to 1.0.4 should be working without any problems

What is the actual behavior?

After updating the zboss library to v 1.0.3 or 1.0.4 and using the same source code the router or end device devices don't join the network because can't process the Transport key message sended by coordinator. Please check the image [attached]. joinnig_process_transporting_key

Steps to reproduce.

  1. With the 1.0.2 version of zboss-lib compile and run the code of router or end device (I use the zigbee example esp_zigbee_HA_Sample)
  2. Upodate the esp-zboss-lib to version 1.0.3 or 1.0.4
  3. Recompile and run again with the new libraries versions.
  4. Enter in commissioning mode of router device to join azigbee network and check the zigbee packets sniffer

More Information.

No response

Metatorn commented 11 months ago

You already tried to erase the nvram? Usually it's necessary after a while updating the libraries or the firmware itself. Doing a esp_zb_nvram_erase_at_start(true) before the esp_zb_start() should work.

You can add the function in both projects and you make a factory reset on server and client, this ensure the connection to a new network, after that you can delete the nvram_erase function and work as usual.

nalves23 commented 11 months ago

Hi @Metatorn

Thank you for your feedback.

In fact I have the concern to erase the memory every times I change significant parts of the code. I've had some problems with this procedure, it was just another one. Thank you, Issue closed!

Kind regards.

nalves23 commented 11 months ago

Hi @Metatorn

unfortunately the issue remains. Now, even with nvram erased the network key message was ignored by router device.

Kind regards.

valdisz commented 11 months ago

Is this the same issue that I experienced here https://github.com/espressif/esp-idf/issues/12657?

From logs, I see this message:

ZBOSS: zdo/zdo_app.c:595   Have not got nwk key - authentication failed
valdisz commented 11 months ago

I'm getting this log from the end device even with downgraded libraries (1.0.0 for zboss and zigbee):

I (28539) ZBOSS: commissioning/bdb/zdo_commissioning_bdb.c:348   dev type 2, joined 0, ext_pan_id 0, authenticated 0, tclk_valid 1
I (36649) ZBOSS: nwk/nwk_join.c:292   Will assoc to pan 0x2c26 on channel 13, dev 0x0, iface_id 0
I (37179) ZBOSS: mac/mac_associate.c:448   received association response
I (37179) ZBOSS: zdo/zdo_app_join.c:772   CONGRATULATIONS! joined status 0, addr 0xcfe3, iface_id 0, ch 13, rejoin 0
I (37179) ZBOSS: zdo/zdo_app_join.c:774   xpanid 15:d3:78:6f:29:6d:90:4a
I (47209) ZBOSS: zdo/zdo_app.c:595   Have not got nwk key - authentication failed
I (47209) ZBOSS: nwk/nwk_main.c:4409   zb_nwk_do_leave param 6 rejoin 0
I (47209) ZBOSS: common/zb_address.c:1063   zb_address_delete ref 4
I (47219) ESP_ZB_ON_OFF_LIGHT: ZDO signal: ZDO Leave (0x3), status: ESP_OK
I (47229) ZBOSS: nwk/nwk_join.c:250   No dev for join
I (47229) ZBOSS: nwk/nwk_join.c:250   No dev for join
I (47239) ZBOSS: zdo/zdo_commissioning.c:210   Can't find PAN to join to! param 0
I (47239) ZBOSS: commissioning/bdb/zdo_commissioning_bdb.c:548   COMMISSIONING_STOP: app signal 10 comm status 3
I (47249) ESP_ZB_ON_OFF_LIGHT: Network steering was not successful (status: ESP_FAIL)

And here are logs if I'm connecting to another Zigbee Coordinator device (Sonoff ZigBee Bridge flashed with Tasmota):

I (28220) ZBOSS: nwk/nwk_join.c:293   Will assoc to pan 0xc84 on channel 11, dev 0x0, iface_id 0
I (28740) ZBOSS: mac/mac_associate.c:448   received association response
I (28740) ZBOSS: zdo/zdo_app_join.c:771   CONGRATULATIONS! joined status 0, addr 0x33ff, iface_id 0, ch 11, rejoin 0
I (28740) ZBOSS: zdo/zdo_app_join.c:773   xpanid cc:cc:cc:cc:96:a8:cc:84
I (28760) ZBOSS: secur/ic_secur.c:78   APS key from installcode: ca:48:26:7e:f8:a7:6b:6b:40:91:04:41:90:b4:fa:f1
I (28770) ZBOSS: zdo/zdo_app.c:595   Have not got nwk key - authentication failed
I (28770) ZBOSS: nwk/nwk_main.c:4409   zb_nwk_do_leave param 10 rejoin 0
I (28780) ZBOSS: common/zb_address.c:1063   zb_address_delete ref 4
I (28790) ESP_ZB_ON_OFF_LIGHT: ZDO signal: ZDO Leave (0x3), status: ESP_OK
I (28790) ZBOSS: zdo/zdo_commissioning.c:210   Can't find PAN to join to! param 0
I (28800) ZBOSS: commissioning/bdb/zdo_commissioning_bdb.c:548   COMMISSIONING_STOP: app signal 10 comm status 3
I (28810) ESP_ZB_ON_OFF_LIGHT: Network steering was not successful (status: ESP_FAIL)
I (29830) ZBOSS: commissioning/bdb/zdo_commissioning_bdb.c:348   dev type 2, joined 0, ext_pan_id 0, authenticated 0, tclk_valid 
xieqinan commented 11 months ago

@valdisz @nalves23 ,

Since I cannot use our coordinator to reproduce it, could you please help us by adding a log (esp_rom_printf("set channel: %d\n", channel);) to this location for debugging purposes?

If possible, you can try flashing the following ha_on_off_light.bin to esp32h2 and see if it works correctly. You only need to follow the provided command lines to flash it.

esptool.py -p [PORT] erase_flash
esptool.py -p {PORT} write_flash 0 ha_on_off_light.bin

ha_on_off_light.zip

valdisz commented 11 months ago

@valdisz @nalves23 ,

Since I cannot use our coordinator to reproduce it, could you please help us by adding a log (esp_rom_printf("set channel: %d\n", channel);) to this location for debugging purposes?

If possible, you can try flashing the following ha_on_off_light.bin to esp32h2 and see if it works correctly. You only need to follow the provided command lines to flash it.

esptool.py -p [PORT] erase_flash
esptool.py -p {PORT} write_flash 0 ha_on_off_light.bin

ha_on_off_light.zip

Added the requested line to the IDF source, and logs from both devices are in the attachments.

end-dev.log coordinator.log

I cannot flash the provided firmware because I only have esp32c6 devices. One is used as the end device, and the other is used as the coordinator.

xieqinan commented 11 months ago

@nalves23 ,

I observed an unusual phenomenon in the sniffer packet you provided. The End Device sent the association request but did not receive the association response, and its short address was not assigned during this process so that the End Device cannot receive the transport key packet with the specific short address. This issue may be related to an extended panid conflict.

I have another suggestion to address this. Please use the esp_zb_set_long_address() function to modify the extended panid for the End Device and attempt to let it join the Zigbee network again.

xieqinan commented 11 months ago

@valdisz

Added the requested line to the IDF source, and logs from both devices are in the attachments.

end-dev.log coordinator.log

Based on the logs you provided, it seems like there might be some code modifications in the HA_on_off_example. I also tried using the ESP32C6 to form the Zigbee network, and it worked fine. If there are any changes made to the example, we can discuss them here. Please share any modifications or specific areas of concern you have in the HA_on_off_example.

Furthermore, please disable the trace log for the testing phase initially.

valdisz commented 11 months ago

@valdisz

Added the requested line to the IDF source, and logs from both devices are in the attachments. end-dev.log coordinator.log

Based on the logs you provided, it seems like there might be some code modifications in the HA_on_off_example. I also tried using the ESP32C6 to form the Zigbee network, and it worked fine. If there are any changes made to the example, we can discuss them here. Please share any modifications or specific areas of concern you have in the HA_on_off_example.

Furthermore, please disable the trace log for the testing phase initially.

Here are the only changes I have made to the examples (except debug logs):

diff --git a/examples/esp_zigbee_HA_sample/HA_on_off_light/main/esp_zb_light.c b/examples/esp_zigbee_HA_sample/HA_on_off_light/main/esp_zb_light.c
index 9159d71..27c448e 100644
--- a/examples/esp_zigbee_HA_sample/HA_on_off_light/main/esp_zb_light.c
+++ b/examples/esp_zigbee_HA_sample/HA_on_off_light/main/esp_zb_light.c
@@ -111,6 +111,10 @@ static void esp_zb_task(void *pvParameters)
     /* initialize Zigbee stack */
     esp_zb_cfg_t zb_nwk_cfg = ESP_ZB_ZED_CONFIG();
     esp_zb_init(&zb_nwk_cfg);
+    
+    // esp_zb_set_trace_level_mask(ESP_ZB_TRACE_LEVEL_DEBUG, ESP_ZB_TRACE_SUBSYSTEM_NWK);
+    esp_zb_nvram_erase_at_start(true);
+
     esp_zb_on_off_light_cfg_t light_cfg = ESP_ZB_DEFAULT_ON_OFF_LIGHT_CONFIG();
     esp_zb_ep_list_t *esp_zb_on_off_light_ep = esp_zb_on_off_light_ep_create(HA_ESP_LIGHT_ENDPOINT, &light_cfg);
     esp_zb_device_register(esp_zb_on_off_light_ep);
diff --git a/examples/esp_zigbee_HA_sample/HA_on_off_switch/main/esp_zb_switch.c b/examples/esp_zigbee_HA_sample/HA_on_off_switch/main/esp_zb_switch.c
index b0f5c33..82c1e0e 100644
--- a/examples/esp_zigbee_HA_sample/HA_on_off_switch/main/esp_zb_switch.c
+++ b/examples/esp_zigbee_HA_sample/HA_on_off_switch/main/esp_zb_switch.c
@@ -154,6 +154,8 @@ static void esp_zb_task(void *pvParameters)
     /* initialize Zigbee stack */
     esp_zb_cfg_t zb_nwk_cfg = ESP_ZB_ZC_CONFIG();
     esp_zb_init(&zb_nwk_cfg);
+
+    esp_zb_nvram_erase_at_start(true);
     esp_zb_on_off_switch_cfg_t switch_cfg = ESP_ZB_DEFAULT_ON_OFF_SWITCH_CONFIG();
     esp_zb_ep_list_t *esp_zb_on_off_switch_ep = esp_zb_on_off_switch_ep_create(HA_ONOFF_SWITCH_ENDPOINT, &switch_cfg);
     esp_zb_device_register(esp_zb_on_off_switch_ep);
diff --git a/examples/esp_zigbee_HA_sample/HA_on_off_switch/partitions.csv b/examples/esp_zigbee_HA_sample/HA_on_off_switch/partitions.csv
index a56c0be..89ce337 100644
--- a/examples/esp_zigbee_HA_sample/HA_on_off_switch/partitions.csv
+++ b/examples/esp_zigbee_HA_sample/HA_on_off_switch/partitions.csv
@@ -2,6 +2,6 @@
 # Note: if you have increased the bootloader size, make sure to update the offsets to avoid overlap
 nvs,        data, nvs,      0x9000,  0x6000,
 phy_init,   data, phy,      0xf000,  0x1000,
-factory,    app,  factory,  0x10000, 900K,
-zb_storage, data, fat,      0xf1000, 16K,
-zb_fct,     data, fat,      0xf5000, 1K,
+factory,    app,  factory,  0x10000, 1M,
+zb_storage, data, fat,      0x111000, 16K,
+zb_fct,     data, fat,      0x116000, 1K,
valdisz commented 11 months ago

@xieqinan I have executed idf.py erase-flash and then flashed both chips. Now they connect. Production configuration is missing right now. will check now what happens when I write it back.

I also tried to join the Tasmota coordinator. After I recreated the network on it, now it connects to that too.

Why esp_zb_nvram_erase_at_start(true) does not help but full flash erase make an effect?

xieqinan commented 11 months ago

The esp_zb_nvram_erase_at_start(true) only erases the related Zigbee memory, excluding zb_fct. The idf.py erase-flash will erase all chip memory, including zb_fct. They are not completely the same.

nalves23 commented 11 months ago

Hi @xieqinan.

I follow you instructions that you provide and I have some changes in behavior compared to yesterday. Today after disconnect all devices (coordinator and my router device) the commissioning process, the router associated with the coordinator and had no problems. I erased again the memory and reprogramming and it worked correctly. Yesterday, this wasn't possible! I noticed that you had indicated that there could be a possible PANID conflict, is this possible because there are other ZigBee devices here in my vicinity? I find it strange because each network has its own PANID and, in terms of history, we have never seen this problem. I just think it's strange that this problem occurred after I updated the zigbee-lib library version from 1.0.2 to the following ones!

I try it with the HA_on_off_light.bin file that you send and works fine.

What could be the reason for the PanId conflict that you talking about? There are some way to reproduce it again?

Kind regards. nalves

nalves23 commented 11 months ago

Hi @xieqinan

Well, I found the reason why this is working today. Yesterday, before finishing my work, I had left zBoss version 1.0.2 active, which is why it is running without problems today. I resumed work with version 1.0.4 and it returned to the same problem.

@nalves23 ,

I observed an unusual phenomenon in the sniffer packet you provided. The End Device sent the association request but did not receive the association response, and its short address was not assigned during this process so that the End Device cannot receive the transport key packet with the specific short address. This issue may be related to an extended panid conflict.

I have another suggestion to address this. Please use the esp_zb_set_long_address() function to modify the extended panid > > for the End Device and attempt to let it join the Zigbee network again.

The log packet sniffer that I send to you was incomplete, the association response is lower after the transport key messages, please, check this new log that i recover today:

image

About this suggestion, where can I put this function esp_zb_set_long_address()? In the router device?

Kind regards.

xieqinan commented 11 months ago

@nalves23 ,

ha_on_off_light.zip

Whether the above .bin can work fine or not?

nalves23 commented 11 months ago

Hi @xieqinan

Whether the above .bin can work fine or not?

I will test it again!

About the Association response, I test the same code compiled with version 1.0.2 of Zboss-lib and get some differences, The Network link message is only send after receive the Association response! This could be a problem reason?

image

I will check this behavior in the bin file that you send and I give you feedback next.

Kind regards.

nalves23 commented 11 months ago

Hi @xieqinan

I test your bin file and the result is the same that I have with my source code. Please check attached the Zigbee sniffer packages:

image

Kind Regards nalves

valdisz commented 11 months ago

The esp_zb_nvram_erase_at_start(true) only erases the related Zigbee memory, excluding zb_fct. The idf.py erase-flash will erase all chip memory, including zb_fct. They are not completely the same.

Need to confirm that in my case Production Config that I have written to the chip was causing a trouble.

xieqinan commented 11 months ago

@nalves23

What could be the reason for the PanId conflict that you talking about? There are some way to reproduce it again?

The coordinator records the long address of the joined device. If there are multiple devices with the same long address in the network, the coordinator may not allow the second device with the same long address to join the network again. You can try using idf.py -p [PORT] erase-flash to erase flash, then add esp_zb_set_long_address(long_address); before esp_zb_start(false);. finally, complie and flash.

The log packet sniffer that I send to you was incomplete, the association response is lower after the transport key messages, please, check this new log that i recover today:

The Zigbee coordinator uses the "association response" to assign a short address to its End Devices (ED) before they join the network. The "transport key" packet is routed based on the short address. If the "association response" arrives after the "transport key" message, the message cannot be routed to the ED because the ED does not have a short address at that time. Consequently, if the ED is erased before booting up, the message will not be received by the ED.

xieqinan commented 11 months ago

@valdisz ,

I try it with the HA_on_off_light.bin file that you send and works fine.

Why did it work this time? Flashing the .bin file is not related to the esp-zboss-lib version. What could be the reason for the successful outcome?

nalves23 commented 11 months ago

@valdisz ,

I try it with the HA_on_off_light.bin file that you send and works fine.

Why did it work this time? Flashing the .bin file is not related to the esp-zboss-lib version. What could be the reason for the successful outcome?

I don't know. I don't have answer to that.

valdisz commented 11 months ago

@valdisz ,

I try it with the HA_on_off_light.bin file that you send and works fine.

Why did it work this time? Flashing the .bin file is not related to the esp-zboss-lib version. What could be the reason for the successful outcome?

Looks like this is not related to the esp-zboss-lib, but, initially, it was looking like it is. I have a question added (#155) about my issue.

nalves23 commented 11 months ago

Hi @xieqinan

I check that Zboos-lib and Zigbee-lib was update to version 1.0.5. I test it and now its working.

The issue was identified on the zigbee-lib library?

Thank you and kind regards . nalves

xieqinan commented 11 months ago

Hello @nalves23 ,

There are few differences between esp-zboss-lib v1.0.5 and v1.0.4. The ha_on_off_light.zip file is also compiled from esp-zboss-lib v1.0.5. Could you please try testing it again?

nalves23 commented 11 months ago

Hi Xieqinan.

With esp-zboss-lib 1.05 version the example works perfectly. Thank you for you feedback and patience. With this I close the issue.

kind regards nalves23