Seagate / cortx-ha

CORTX ha (High-Availability) is responsible for ensuring that CORTX Solution is available in case of any hardware component or software service failures. It takes care of failover/ failback control flow for affected services and stabilizes them across CORTX cluster.
https://github.com/Seagate/cortx
GNU Affero General Public License v3.0
4 stars 45 forks source link

CORTX-29930: Enhance monitoring capability for control node #685

Closed mariyappanp closed 2 years ago

mariyappanp commented 2 years ago

Problem Statement

Add control node cluster cardinality and watch control node events

Design

CORTX-29930 - Enhance monitoring capability for control node

Coding

Testing

test_results.txt

Review Checklist

Review Checklist

Documentation

Checklist for Author

mariyappanp commented 2 years ago

cortx>ha>v1>cluster_cardinality:{"num_nodes": 8, "node_list": ["d5bd40a41da44c8dbf90ce418453f711", "645fe1d98a774b0a9b1a5972c6857741", "2434adbe55f046c2a3ec219d04b720d3", "cad486c5d73d44de9413ebaf8e16c6b8", "099d7bf619e24591a3f01b00324ce18a", "4c38e91086d745cf9104798a2e96785b", "926e2fb713924aa8a5b156695206364c", "dd8d2ceb64444e6caeb5532753306263", "04539dd8be544c428c56d9b22baec91a"]}

Please check why num_nodes count is mismatched. it should be 9.

Looks like data/key was already there. And again mini provision was performed. And consul update happened(consul kv put) which I think overwrites the data. Initially, I have updated control node in cardinality and tested monitoring functionality. This test result file is not updated later. Before posting this PR, I have tested thoroughly and count is 9. I will update the test result file. @akash2144