espressif / esp-zigbee-sdk

Espressif Zigbee SDK
Apache License 2.0
164 stars 28 forks source link

Support silent rejoin (useful for deep sleep) (TZ-84) #9

Closed inorichi closed 10 months ago

inorichi commented 1 year ago

I'm playing around with a battery powered button and in order to run it for long periods of time I'm using deep sleep with GPIO wakeups. That's working totally fine and the ESP32-C6 consumes 7µA as it should.

However every time it wakes, I need to call all the zigbee initialization, so it reestablishes the connection, announce itself to the coordinator and report the button press through attribute reporting, and this takes around 4 seconds since the press, which is a lot.

I've checked with other battery powered Zigbee devices and they do not reconnect and announce themselves when waking up. I only see the attribute reporting (I'm using the zigbee2mqtt logs to debug) so I guess they can keep the connection while in sleep mode.

Is this currently possible? I don't know if we can store the required data structures in RTC with RTC_DATA_ATTR.

Edit: They call it a Sleepy end device here: https://github.com/SiliconLabs/zigbee_applications/blob/master/zigbee_concepts/Zigbee-Networking-Concepts/Networking%20Concepts%20-%20End%20Devices%20and%20Polling.md#end-devices

likunqiao097304 commented 1 year ago

@inorichi Hi, firstly the deep sleep mode will lost the connection in Zigbee, that is why after ESP32-C6 back online it will do the rejoin process (but still it shouldn't be 4 seconds long from my Zigbee sniffer log). For your need, we have a light sleep mode which will meet your condition. During this mode, it will periodically wake up to keep this Zigbee end-device alive. However, the ESP32-C6 light sleep mode is still not fully ready yet. So for the short word, currently it is not possible, but it will soon be ready.

inorichi commented 1 year ago

The reconnection actually takes 1 second (and this would be acceptable) but the attribute report takes 3 additional seconds.

These are the ESP logs:

I (358) zigbee_driver: ZDO signal: 23, status: -1
I (358) zigbee_driver: Zigbee stack initialized
I (1008) zigbee_driver: Joined network successfully (Extended PAN ID: dd:dd:dd:dd:dd:dd:dd:dd, PAN ID: 0x1a62, Channel:11)
I (1018) zigbee_driver: Reporting attribute change <-- here I call "esp_zb_zcl_set_attribute_val"

And these are the z2m logs:

debug 2023-03-16 10:19:13 Device 'miniboard' announced itself
info 2023-03-16 10:19:13 MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"miniboard","ieee_address":"0x000000..."},"type":"device_announce"}'
info 2023-03-16 10:19:13 MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"announce","meta":{"friendly_name":"miniboard"},"type":"device_announced"}'
debug 2023-03-16 10:19:15 Retrieving state of 'miniboard' after reconnect
debug 2023-03-16 10:19:17 Received Zigbee message from 'miniboard', type 'attributeReport', cluster '65280', data '{"1":1}' from endpoint 2 with groupID 0
info 2023-03-16 10:19:17 MQTT publish: topic 'zigbee2mqtt/miniboard', payload '{"action":"many_1","linkquality":255}'
info 2023-03-16 10:19:17 MQTT publish: topic 'zigbee2mqtt/miniboard/action', payload 'many_1'

Is light sleep the only way around it? I'd like to squeeze battery as much as possible and light sleep is 35µA according to datasheet.

From what I've gathered (I'm no Zigbee expert), the end device should be able to resume operations with a silent rejoin if the end device did not reach a aging timeout (I configured it to ESP_ZB_ED_AGING_TIMEOUT_64MIN but nothing changed).

Maybe I need to do something else (like the finding and binding feature) to store additional data in NVRAM, like the binding table?

inorichi commented 1 year ago

Here's an extract of a PDF I found (section 7.3.3):

A device that loses connection to the network can attempt to rejoin using the ZigBee NWK layer rejoin command, which also triggers a beacon request. Since the NWK layer rejoin command use NWK layer security, the difference from a join based on 802.15.4 association is that no additional authentication step needs to be performed when security is enabled, and that nodes may rejoin any parent as long as it has available capacity, regardless of the status of the accept joining flag of the beacon. If it rejoins a different parent (e.g., because the original parent no longer responds), the node will be allocated a different short address, and must broadcast a device announce to the network in order to update bindings that may be configured in other nodes (see Figure 7.2).

After power cycles, most implementations do not immediately attempt an explicit rejoin in order to avoid network overloads, if they still have the address of their parent node and their own short address in nonvolatile memory. It is assumed that all nodes will restart in the same state as before the power cycle. An explicit rejoin is triggered only if the node fails to communicate with its parent. Such a procedure is often referred to as “silent rejoin”. It is also the default procedure, in ZigBee Pro/2007, when the coordinator triggers a channel change (annex A)

likunqiao097304 commented 1 year ago

The rejoin process should be based on the Zigbee BDB specification (PRO Base Device Behavior Specification, v3.0.1): image So it has to do the rejoin process with rejoin command and device announce after response is successful. I don't think we have implemented that silent rejoin

likunqiao097304 commented 1 year ago

Is light sleep the only way around it? I'd like to squeeze battery as much as possible and light sleep is 35µA according to datasheet.

Regarding this, I have talked with my team. So currently there is no way around for C6 chip. For lower current consumption on light sleep, you have to wait for the future chip coming.

inorichi commented 1 year ago

Thanks for your time.

Here's another PDF where it's explained a bit better (section 7.2.3, it takes a while to load the PDF).

It says that silent rejoin is something you won’t find in the ZigBee specification, but all stack vendors provide it because it is necessary in a deployed network of any size.

And the implementation is straightforward as you don't have to do anything prior to communicate if you already have (PAN ID, Extended PAN ID, NwkAddr, security key), so I've tried the following:

esp_zb_start(false)
esp_zb_main_loop_iteration()

And doing nothing on the ESP_ZB_ZDO_SIGNAL_SKIP_STARTUP signal, however I'm getting a signal code 60 (which is not defined in the list of signal codes on esp_zb_app_signal_type_t).

Would it be possible to add support for silent rejoin?

On a side note, is the installcode working right now? I tried overwriting the fct partition with an image generated with your python script, however when I do that and set INSTALLCODE_POLICY_ENABLE to true, I can no longer join my Zigbee network, I automatically get a LEAVE signal from the coordinator.

likunqiao097304 commented 1 year ago

signal code 60

It is internal message that means device unavailable, some NWK or APS couldn't send successfully. Thanks for your feedback, we will consider this silent rejoin if is is possible.

For the install code should be working right now. If you add your install code successfully with bin file flashed into zb_fct to the end-device side, you should also need to call esp_zb_secur_ic_str_add or esp_zb_secur_ic_add to add that install code into you coordinator side.

energizer91 commented 1 year ago

It is internal message that means device unavailable, some NWK or APS couldn't send successfully.

Thanks for an explanation @likunqiao097304! I've getting this error pretty often (approx. once a day) and totally don't know what to do with that, device refuses to reconnect to network until i restart it.

Is there a way to force device to reconnect in that case? Or it's better to just reboot it in that case? Thanks!

likunqiao097304 commented 1 year ago

I 've getting this error pretty often (approx. once a day) and totally don't know what to do with that, device refuses to reconnect to network until i restart it.

It might be some issue there, I will try to fix it. In the meantime, could you explain little bit more details about your environment and how you duplicate your error.

Is there a way to force device to reconnect in that case? Or it's better to just reboot it in that case? Thanks!

Now it is just better to reboot it.

likunqiao097304 commented 1 year ago

@inorichi Regarding the silent rejoin, doing nothing on the ESP_ZB_ZDO_SIGNAL_SKIP_STARTUP signal still can't make the behavior you want. Also, it is goes against specification, so this product probably will not pass the Zigbee certification test. Your sharing book may little bit out of date which not represent the current state of the specification. Is this a really need a feature?

inorichi commented 1 year ago

Well, I still think this is a very useful feature for battery powered ESP, but you have the last word.

I don't think this goes against the specification because it's an implementation detail and you can still always do commisioning. An example implementation of a startup signal could be:

case ESP_ZB_ZDO_SIGNAL_SKIP_STARTUP:
  if (!zb_zdo_joined()) {
    esp_zb_bdb_start_top_level_commissioning(ESP_ZB_BDB_MODE_NETWORK_STEERING);
  } else {
    // I'm commisioned to a network so I don't really need to do commisioning again
  }
  break;

Anyways, I've linked against the zboss library and right after esp_zb_start(false) everything (or most) is properly initialized:

zb_address_get_pan_id(panid_ref, extended_pan_id); <-- returns my network extended pan id
zb_address_get_short_pan_id(panid_ref, &short_id); <-- returns my network short pan id
zb_get_short_address(); <-- returns my device short addr
zb_nwk_get_pib_cache(); <-- returns a pointer to a struct containing the same values as above
zb_nwk_get_parent(); <-- returns the parent of the previous power cycle (the device where it should try to send data)
zb_get_long_address(&long_addr); <-- this one failed, but I fixed it by calling zb_set_long_address(long_addr) after esp_zb_start(false)

If the parent does not send ACKs then we should receive a "PARENT LOST" event and try to find a new parent (either by commissioning or by sending MAC scan requests). Parent lost can happen even without support of silent rejoin so nothing new here.

I'm trying to get my hands on a CC2531 so I can sniff my network and find out what other end devices are doing, but in the meantime I can give you an example of a certified Aqara temperature sensor: I removed the battery, waited a minute and clicked the reset button to ensure the device is fully shutdown. Then I went to z2m logs and after putting the battery back, I only received an attribute report with new temperature readings in a very short time, the device did not do any commissioning.

I don't know what else zboss is missing to not send the data frames (or maybe it does, I really need to get my hands on the sniffer). Sadly I can't look into the zboss implementation to find out :disappointed:

likunqiao097304 commented 1 year ago

@inorichi For the Aqara product, I am not sure it applied for the Zigbee 3.0 with latest BDB specs, it might be the legacy zigbee product. For the latest the BDB specs, we have to follow rejoin procedure that has rejoin request and device announcement.

inorichi commented 1 year ago

You are right, it's a 1.2 device.

I've got in touch with the ZBOSS engineers and they told me they'll see if something can be done (I've linked them to this issue, in case they want to give a public response), so if you don't mind I'd like to keep this open for now.

likunqiao097304 commented 1 year ago

@inorichi Sure. In the meantime, I could also work with ZBOSS engineers to figure out together just to avoid message back and forth.

inorichi commented 1 year ago

Today I got a reply to an email I sent to the ZigBee Alliance last week too and this is what they told me:

It looks like there’s been recent activity on the ticket that you mentioned, this appears to be an implementation issue as the ZigBee rejoining should not start rejoining from scratch after every wakeup. Hopefully, it will be sorted out. Please let us know if you have any follow up question or comment.

So the good news is that a device implementing it should still be compliant with ZigBee 3.0

Suxsem commented 1 year ago

Any news on this one?

xieqinan commented 1 year ago

hello @inorichi @Suxsem ,

If the Zigbee device is of the router type, it can support the Silent Rejoin feature, eliminating the need for additional operations to join the Zigbee network. Please refer to the Zigbee router for details.

Furthermore, the ESP-ZIGBEE-SDK now supports the Light sleep feature, reducing the bottom electric current to nearly 23 uA for ESP32H2. This feature is instrumental in minimizing power consumption.

Bosemani commented 11 months ago

Any update on this?

chshu commented 11 months ago

@Suxsem @Bosemani @inorichi Thanks for your patient on this request.

ZBOSS can only support light sleep currently, in which case the digital peripherals, RAM, and CPU resume operation and their internal states are preserved, so the device could continue to work seamlessly after light sleep. But after deep sleep, all run-time information wil lose, the device has to go through the rejoin process as defined in Zigbee Spec (https://github.com/espressif/esp-zigbee-sdk/issues/9#issuecomment-1475772987).

We have had some discussion with ZBOSS community, and it seems possible in theory to avoid the rejoin process after deep sleep, but huge code refactor work required to resume all the run-time information from flash or RTC memory. Unfortunately there is no plan to support it in near future, light sleep should be the ideal solution for battery powered device.

ESP32-H2 is the 1st 802.15.4 SoC from Espressif, understood the light sleep current is not very friendly for the battery powered devices, it mainly targets to the line-powered devices and some use cases with big battery. Please stay tuned for future 15.4 SoCs which could do better on power consumpton.

Bosemani commented 11 months ago

Hai @chshu Thank You. In current, light sleep example by sniffing the Packet. I found End device requesting data every 5 second interval. Is it possible to adjust the interval?

chshu commented 11 months ago

Hai @chshu Thank You. In current, light sleep example by sniffing the Packet. I found End device requesting data every 5 second interval. Is it possible to adjust the interval?

Change the keep alive configuration to adjust the interval.

Bosemani commented 11 months ago

Hai @chshu What is the maximum value can give? keep_alive is uint16. whether can we give 65535?

kelin6 commented 11 months ago

@Bosemani keep_alive should be of type uint, not uint16. We will update the API. Thank you for your feedback.

kelin6 commented 11 months ago

@Bosemani Please update the esp-zigbee-sdk version to 1.0.4. The type of the parameter keep_alive has been updated, and relevant instructions for the maximum value have been added. Please refer to the documentation for details.

Bosemani commented 11 months ago

Hai @kelin6 I update to esp-zigbee-sdk version to 1.0.4 & tested Sleepy end device example. I changed

#define ED_KEEP_ALIVE                   8000 

#define ESP_ZB_ZED_CONFIG()                                         \
    {                                                                                          \
        .esp_zb_role = ESP_ZB_DEVICE_TYPE_ED,                       \
        .install_code_policy = INSTALLCODE_POLICY_ENABLE,           \
        .nwk_cfg.zed_cfg = {                                                                \
            .ed_timeout = ED_AGING_TIMEOUT,                         \
            .keep_alive = ED_KEEP_ALIVE,                            \
        },                                                          \
    }

Also, I changed sleep threshold.

esp_zb_sleep_set_threshold(2000);

Still The Data request interval is 5 seconds. But it should be 8 seconds.

kelin6 commented 11 months ago

@Bosemani As the polling control cluster is not defined in Zigbee sleep, we need to modify the long poll interval using zb_zdo_pim_set_long_poll_interval. Otherwise, the Long Poll Interval property should be set using the Poll Control Cluster API.

The configuration of the long poll interval through zb_zdo_pim_set_long_poll_interval should be done after the join process because during the join process, the long poll interval will be set to the default value ZB_PIM_default_Long_Poll_interval (5 seconds).

Therefore, it is recommended to call zb_zdo_pim_set_long_poll_interval after the connection is established. We will update the corresponding API in the future releases of the esp-zigbee-sdk.

Please add the following code in the Zigbee sleep example:

#include "zboss_api.h"
...
...
...
case ESP_ZB_BDB_SIGNAL_STEERING:
        if (err_status == ESP_OK) {
            esp_zb_ieee_addr_t extended_pan_id;
            esp_zb_get_extended_pan_id(extended_pan_id);
            ESP_LOGI(TAG, "Joined network successfully (Extended PAN ID: %02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x, PAN ID: 0x%04hx, Channel:%d, Short Address: 0x%04hx)",
                     extended_pan_id[7], extended_pan_id[6], extended_pan_id[5], extended_pan_id[4],
                     extended_pan_id[3], extended_pan_id[2], extended_pan_id[1], extended_pan_id[0],
                     esp_zb_get_pan_id(), esp_zb_get_current_channel(), esp_zb_get_short_address());

                     zb_zdo_pim_set_long_poll_interval(ED_KEEP_ALIVE);
Bosemani commented 11 months ago

Hai @kelin6 I made changes according to your suggestion. Now It's ED_KEEP_ALIVE is working.

Thank you!

chshu commented 10 months ago

Closing this issue. Both Zigbee light sleep and deep sleep examples are supported here: esp_zigbee_sleep.

The silent rejoin is only supported by light sleep. After deep sleep, the sleepy device needs to rejoin the network.