chemadh / zigbee2mqtt_ha

Zigbee2mqtt High-availability controller prototype
25 stars 0 forks source link
home-assistant zigbee2mqtt

Zigbee2mqtt High-Availability controller prototype

Introduction

High Availability controller prototype for two zigbee2mqtt instances with independent USB Zigbee dongles (specifically implemented for Sonoff ZBDongle-P). The final purpose is enabling that one of the zigbee2mqtt instances could play the "active" role, meanwhile the second one is in "stand-by" status for a single Zigbee network (enabling also an automated switchover between active and stand-by nodes in case of detecting problems in the active instance). Please note that Zigbee standard only allows to define a single coordinator node, so the only available high-availability model that could be used is active-stand by. Detailed features:

The overall design is shown in the following diagram:

Environment details

Find below some relevant details about the prototype environment used. In case of using a different environment, it could be required to make some minor adaptations in the scripts provided:

Common components used in the nodes previously defined:

Zigbee dongle configuration:

Scripts for synchronization of configuration files from active to stand-by zigbee2mqtt

Each zigbee2mqtt High-Availability instance needs to synchronize the configuration files from active to stand-by instances. The following scripts to be deployed in each zigbee2mqtt instance are shared to achieve it:

Both files include the same logic, with different example configuration parameters to define local or remote zigbee2mqtt node variables. The first lines in each script contains the configuration variables to be updated to each environment. Explanation of each parameter, below:

The scripts are intended to be run with cron or timer services (i.e. each 5 minutes), to ensure an automatic zigbmee2mqtt configuration synchronization. Example of this type of config in Alpine Linux with root user (more instructions in https://lucshelton.com/blog/cron-jobs-with-alpine-linux-and-docker/):

Scripts for Zigbee2mqtt High-Availability centralized control

A third node will control the active to stand-by failover action between zigbee2mqtt coordinators (whose configuration is synchronized using the scripts defined above). It makes sense that this third node should be the MQTT broker node (like Mosqitto running in Home Assistant environment), since this element will be notified in case the communication with active zigbee2mqtt fails. The following set of scripts are provided for this purpose:

ping_mqtts.sh

Script to check connectivity from controller to both zigbee2mqtt instances. It is intended to check connectivity with both nodes before making an scheduled active to stand-by switchover. The first lines in the script contains the configuration variables to be updated for each environment. Explanation of each parameter, below:

The script can be directly executed from the Linux prompt. No command-line parameters are required.

stopZigbee2mqtt1.sh / stopZigbee2mqtt2.sh

Couple of scripts to stop remotely each zigbee2mqtt instance. Used by the controller node to initiate a manual switchover between active and stand-by nodes, when MQTT can detect service interruption in the active zigbee2mqtt node. The first lines in the script contains the configuration variables to be updated for each environment. Explanation of each parameter, below:

The script can be directly executed from the Linux prompt. No command-line parameters are required.

activeZigbee2mqtt1.sh / activeZigbee2mqtt2.sh

Couple of scripts to perform zigbee2mqtt switchover, changing the active service in the first or second node respectively. Sequence of activities followed by each script:

As described above, both scripts define a similar logic, with different details to apply the switchover to a different node. The first lines in the script contains the configuration variables to be updated for each environment. Explanation of each parameter, below:

The scripts can be directly executed from the Linux prompt. No command-line parameters are required.

Usage of High-Availability control scripts from Home Assistant

This section assumes that the controller scripts defined previously are deployed in a Linux system where Home Assistant is also available. The two zigbee2mqtt remote nodes should be also running with automatic zigbee2mqtt config synchronization up and running in the same LAN.

In addition to this, there are some limitations related to the Home Assistant's shell script environment that needs to be addressed.

Environment preparation for Home Assistant installed in Docker

Running shell scripts in Home Assistant under Docker environment is problematic in case of requiring additional packages, like in our case, SNMP. The reason for this is the Home Assistant upgrade process: it will not preserve the custom packages installed by the user under the Home Assistant Docker environment. In addition to this, there is a maximum time of script execution defined in Home Assistant by design: 1 minute. The proposed workaround to avoid these issues is executing the scripts remotely, connecting from Docker to de host machine where Home Assistant Docker is running (The control scripts defined above should be stored there).

To follow this approach, the Home Assistant Docker environment should enable the next points (it should be accessible from host machine executing sudo docker exec -it homeassistant bash). This parametrization is mainly needed to enable SSH connection without interactive credentials again:

The scripts to define inside the Home Assistant Docker are defined below. The Parameters to replace directly in the contents of these scripts are:

ping_mqtts.sh

#!/bin/bash

ssh -o UserKnownHostsFile=/config/.ssh/known_hosts -i /config/.ssh/id_rsa <user>@<ip> '<script_path>/ping_mqtts.sh'

stopZigbee2mqtt1.sh

#!/bin/bash

ssh -o UserKnownHostsFile=/config/.ssh/known_hosts -i /config/.ssh/id_rsa <user>@<ip> '<script_path>/stopZigbee2mqtt1.sh'

stopZigbee2mqtt2.sh

#!/bin/bash

ssh -o UserKnownHostsFile=/config/.ssh/known_hosts -i /config/.ssh/id_rsa <user>@<ip> '<script_path>/stopZigbee2mqtt2.sh'

activeZigbee2mqtt1.sh

This is where the maximum Home Assistant shell script execution time of 1 minute should be avoided: the full fallback procedure usually takes more than this (mainly because of time required to extract and write NVRAM dumps in Zigbee USB dongles). In consequence, the script below is defined to execute switchover operations in second plane (using nohup parameter). The results are stored in a file, just if it needs to be checked (result.log in the same script path).

#!/bin/bash

ssh -o UserKnownHostsFile=/config/.ssh/known_hosts -i /config/.ssh/id_rsa -f <user>@<ip> 'rm <script_path>/result.log; nohup <script_path>/activeZigbee2mqtt1.sh ><script_path>result.log 2>&1 </dev/null &'

activeZigbee2mqtt2.sh

Same comment applicable from activeZigbee2mqtt1, above.

#!/bin/bash

ssh -o UserKnownHostsFile=/config/.ssh/known_hosts -i /config/.ssh/id_rsa -f <user>@<ip> 'rm <script_path>/result.log; nohup <script_path>/activeZigbee2mqtt2.sh ><script_path>result.log 2>&1 </dev/null &'

configuration.yaml additions in Home Assistant

Once the previous steps are completed, the following content can be added to configuration.yaml of Home Assistant in order to enable zigbee2mqtt as a sensor that can be monitored. The final purposes are:

mqtt:
  binary_sensor:
    - name: "Zigbee2mqtt bridge status"
      state_topic: "zigbee2mqtt/bridge/state"
      unique_id: "zigbee2mtt bridge status"
      value_template: '{{ value_json.state}}'
      payload_on: "online"
      payload_off: "offline"
      device_class: running
  sensor:
    - name: "Zigbee2mqtt bridge info"
      unique_id: "zigbee2mtt bridge info"
      state_topic: "zigbee2mqtt/bridge/info"
      value_template: >
        "{% if "192.168.34.123" in value %}
           {{ "192.168.34.123" }}
         {% endif %}
         {% if "192.168.34.124" in value %}
           {{ "192.168.34.124" }}
         {% endif %}"

Please update 192.168.34.123 and 192.168.34.124 IPs with your deployment zigbee2mqtt instances IPs.

In addition to this, The following content should be also added to configuration.yaml file of Home Assistant, in order to make available to Home Assistant's automations the High-Availability zigbee2mqtt controller scripts (previously stored inside the Home Assistant Docker):

shell_command:
  failover_to_mqtt1: "bash ./sh/activeZigbee2mqtt1.sh"
  failover_to_mqtt2: "bash ./sh/activeZigbee2mqtt2.sh"
  ping_mqtts: "bash ./sh/ping_mqtts.sh"
  stop_mqtt1: "bash ./sh/stopZigbee2mqtt1.sh"
  stop_mqtt2: "bash ./sh/stopZigbee2mqtt2.sh"

Scripts' path should be updated in case of using a different location in Home Assistant storage (using Docker or directly OS distribution).

Home Assistant needs to be rebooted to reload configuration.yaml with new contents. If everything goes OK, the summary panel should show something similar to the following capture:

Home Assistant Scripts

The following extract of Home assistant scripts.yaml integrates the zigbee2mqtt functionalities with the Home Assistant control system. Zigbee2mqtt nodes are identified by their IPs (192.168.34.123 / .124), so it should be updated to the IPs in use in each deployment.

periodic_zigbe2mqtt_switchover:
  alias: periodic zigbee2mqtt switchover
  sequence:
  - variables:
      ping_nodes_result:
      mqtt_change_result:
  - service: shell_command.ping_mqtts
    data: {}
    response_variable: ping_nodes_result
    alias: Call to script ping_mqtts
  - alias: Checking pings result
    if:
    - condition: template
      value_template: '{{ ping_nodes_result[''stdout''] == ''success'' }}'
    then:
    - service: notify.persistent_notification
      data:
        message: Ping OK
        title: Ping MQTTs result
    - alias: If zigbee2mqtt node is 192.168.34.123, making switchover to .124
      if:
      - condition: template
        value_template: '{{ ''192.168.34.123'' in states(''sensor.zigbee2mqtt_bridge_info'')  }} '
      then:
      - alias: Failover to .124 instance notification
        service: notify.persistent_notification
        data:
          message: 'identified node: 192.168.34.123. switching over to 192.168.124'
          title: zigbee2mqtt periodic switchover
      - service: shell_command.stop_mqtt1
        data: {}
        response_variable: mqtt_change_result
    - alias: If zigbee2mqtt node is 192.168.34.124, making switchover to .123
      if:
      - condition: template
        value_template: '{{ ''192.168.34.124'' in states(''sensor.zigbee2mqtt_bridge_info'')  }} '
      then:
      - alias: Failover to .123 instance notification
        service: notify.persistent_notification
        data:
          message: 'identified node: 192.168.34.124. switching over to 192.168.123'
          title: zigbee2mqtt periodic switchover
      - service: shell_command.stop_mqtt2
        data: {}
        response_variable: mqtt_change_result
    else:
    - service: notify.persistent_notification
      data:
        message: Ping Error
        title: Result of Ping MQTTs
  mode: single
  icon: mdi:arrow-oscillating

Home Assistant automations

Finally, the following Home Assistant automations.yaml snippets defines the upper layer of the high-availability Zigbee2mqtt control system.

- id: '1702417248508'
  alias: Weekly zigbee2mqtt switch-over
  description: ''
  trigger:
  - platform: time
    at: 03:31:00
  condition:
  - condition: time
    weekday:
    - wed
    - sun
  action:
  - service: script.periodic_zigbe2mqtt_switchover
    data: {}
  mode: single

Validation scenario of the solution

The prototype is up and running during the last months applying two switchovers per week. during this time, the Zigbee network has been expanded from few nodes to use more than 20 devices between sensors and actuators of different types and brands. No issues found up to now.

It has also been tested by directly disconnecting the active USB dongle from the environment several times (spaced a reasonable time to ensure a full resynchronization). It has always leaded to a correct service interruption detection in Home Assistant and switchover to stand-by node (with the caveat of not being able to load the last up-to-date NVRAM dump from active node, but not causing a service disruption meanwhile the last dump would be reasonably recent -by previously applying scheduled switchovers-).

In order to achieve a full High Available deployment, Home Assistant itself (or the controller selected) should be deployed also in a fault tolerant manner. Since this component is not depending on hardware devices, it is easier to achieve than in the case of Zigbee2mqtt (that depends on the Zigbee coordinator hardware). In example, it can be achieved by deploying the solution on top of a redundant virtualized infrastructure, like Proxmox Virtual Environment.

Possible further improvements