espressif / esp-zigbee-sdk

Espressif Zigbee SDK
Apache License 2.0
173 stars 29 forks source link

Paired end device can occasionally not connect coordinator (TZ-1247) #465

Open CharlesJin6 opened 3 weeks ago

CharlesJin6 commented 3 weeks ago

Answers checklist.

IDF version.

v5.2.2

esp-zigbee-lib version.

1.5.1

esp-zboss-lib version.

1.5.1

Espressif SoC revision.

ESP32-C6

What is the expected behavior?

The connection can be successful every time the End device is powered on.

What is the actual behavior?

As title,paired end device can occasionally not connect coordinator. It happens about once every 24 hours. After it happens, just powering on the terminal device again cannot solve the problem. However, powering on the coordinator again can solve the problem.

Steps to reproduce.

  1. Power on coordinator
  2. Power on end device
  3. end device send message to coordinator
  4. end device goto suspend
  5. end device wake up to send message to coordinator
  6. Repeat steps 4 and 5

More Information.

end device log:

[16:19:40.259]�ա���I (6068) end_device: ZDO signal: BDB Device Reboot (0x6), status: ESP_FAIL
W (6068) end_device: Stack BDB Device Reboot failure with ESP_FAIL status, steering
[16:19:46.543]�ա���I (12348) end_device: ZDO signal: BDB Device Reboot (0x6), status: ESP_FAIL
W (12348) end_device: Stack BDB Device Reboot failure with ESP_FAI
[16:19:46.564]�ա���L status, steering

coordinator log:

[16:19:04.297]�ա���I (107398856) coordinator: ZDO signal: NLME Status Indication (0x32), status: ESP_OK
[16:19:07.310]�ա���I (107401866) coordinator: ZDO signal: NLME Status Indication (0x32), status: ESP_OK
[16:19:10.303]�ա���I (107404856) coordinator: ZDO signal: ZDO Device Unavailable (0x3c), status: ESP_OK
[16:19:52.342]�ա���I (107446906) coordinator: ZDO signal: ZDO Device Unavailable (0x3c), status: ESP_OK
[16:20:22.383]�ա���I (107476936) coordinator: ZDO signal: NLME Status Indication (0x32), status: ESP_OK
xieqinan commented 3 weeks ago

Hi @CharlesJin6 ,

Are you saying that if a network is successfully formed with the coordinator and end device, the end device cannot reconnect to the coordinator after rebooting 24 hours later?

CharlesJin6 commented 3 weeks ago

Hi @CharlesJin6 ,

Are you saying that if a network is successfully formed with the coordinator and end device, the end device cannot reconnect to the coordinator after rebooting 24 hours later?

Hi @xieqinan , My coordinator is always working. My end device wakes up every 1 minute to collect data (zigbee is also started at this time). After collecting three times (that is, after waking up three times), it is transmitted to the coordinator through zigbee.

xieqinan commented 3 weeks ago

My coordinator is always working. My end device wakes up every 1 minute to collect data (zigbee is also started at this time). After collecting three times (that is, after waking up three times), it is transmitted to the coordinator through zigbee.

OK, could you please provide more complete logs from the coordinator and end device?

CharlesJin6 commented 3 weeks ago

My coordinator is always working. My end device wakes up every 1 minute to collect data (zigbee is also started at this time). After collecting three times (that is, after waking up three times), it is transmitted to the coordinator through zigbee.

OK, could you please provide more complete logs from the coordinator and end device?

Hi @xieqinan , end device log:SaveWindows2024_10_25_16-21-33.TXT

coordinator log:SaveWindows2024_10_25_16-21-46.TXT

You can view the logs of the end device and coordinator around 16:19:00

xieqinan commented 3 weeks ago

@CharlesJin6 ,

end device log:SaveWindows2024_10_25_16-21-33.TXT

coordinator log:SaveWindows2024_10_25_16-21-46.TXT

This information is helpful.

Could you please test using the switch example and the deep sleep device? I tried to reproduce your issue with these examples, but they work fine.

CharlesJin6 commented 3 weeks ago

This information is helpful.

Could you please test using the switch example and the deep sleep device? I tried to reproduce your issue with these examples, but they work fine. Hi @xieqinan , Thank you very much for your help in reproducing it, my previous description was not detailed enough. I add some information below:

  1. The end device enters deep sleep
  2. After waking up from sleep for 1 minute, one thread executes to start the zigbee task, and one thread executes the collection. The collected data is stored using RTC stubs(using 21 bytes).Then enter deep sleep again
  3. Upload every three collections
  4. zigbee binds custom attributes and ota attributes

I personally think that my program is very similar to the deep sleep device.I'm going to cut my code to test and see if I can reproduce it. Also, does this log below looks abnormal? end_device: ZDO signal: BDB Device Reboot (0x6), status: ESP_FAIL end_device: Stack BDB Device Reboot failure with ESP_FAIL status, steering

If it is abnormal, is it caused by the coordinator or the terminal device? I suspect that the terminal device sent part of the protocol stack data (incomplete) and then entered deep sleep, causing the coordinator's protocol stack to become disordered.

xieqinan commented 2 weeks ago

@CharlesJin6 ,

I personally think that my program is very similar to the deep sleep device.I'm going to cut my code to test and see if I can reproduce it.

If you can reproduce the issue using the deep sleep device and the switch example, it would be very helpful in resolving this issue.

If it is abnormal, is it caused by the coordinator or the terminal device?

That's unusual; it seems the coordinator is rejecting the end device's rejoin attempts.

CharlesJin6 commented 1 week ago

@CharlesJin6 ,

I personally think that my program is very similar to the deep sleep device.I'm going to cut my code to test and see if I can reproduce it.

If you can reproduce the issue using the deep sleep device and the switch example, it would be very helpful in resolving this issue.

If it is abnormal, is it caused by the coordinator or the terminal device?

That's unusual; it seems the coordinator is rejecting the end device's rejoin attempts.

Hi @xieqinan , I reproduced the problem again over the weekend using my program. Found the rule: when the coordinator receives the packet in rtc for 24 hours, the subsequent connection will fail.Attachment logUntitled-1.log image The restart command can solve this problem by hot restarting the coordinator. If there is no hot restart, it will take several hours for the end device to connect.

CharlesJin6 commented 6 days ago

Hi @xieqinan , Any suggestions or progress?

xieqinan commented 6 days ago

@CharlesJin6 ,

The phenomenon is unusual, and I don’t have a good solution for it at the moment. Could you also please provide us with the .pcap sniffer logs?

Additionally, it would be very helpful if you could share simple examples to reproduce this issue.

CharlesJin6 commented 1 day ago

@CharlesJin6 ,

The phenomenon is unusual, and I don’t have a good solution for it at the moment. Could you also please provide us with the .pcap sniffer logs?

Additionally, it would be very helpful if you could share simple examples to reproduce this issue.

Thank you for your reply. It is not convenient to provide the .pcap file now. The current workaround solution is to restart the coordinator every 23 hours