Tendrl / node-agent

A python agent local to every managed storage node in the sds cluster
GNU Lesser General Public License v2.1
4 stars 14 forks source link

Tendrl stops monitoring when glusterd is down in provisioner node #870

Closed GowthamShanmugam closed 5 years ago

GowthamShanmugam commented 5 years ago

I have created 3 nodes gluster cluster. After the import flow, I have manually brought down the glusterd service in provisioner node. After a few minutes, volume details are disappeared from the tendrl-UI.

In grafana, volumes related panels are also affected.

When glusterd is down, Provisioner node is unable to execute gluster commands. So it stops collecting the gluster cluster topology and monitoring metrics. All Other nodes are unable to claim the provisioner tag.

Request ETCD find the provisioner node:

  1. Read provisioner key to find node_id "indexes/tags/provisioner"
  2. Read fqdn key fro node_context: "nodes/{node_id}/NodeContext/fqdn"
GowthamShanmugam commented 5 years ago

For now, I am closing this issue.