Seagate / cortx-ha

CORTX ha (High-Availability) is responsible for ensuring that CORTX Solution is available in case of any hardware component or software service failures. It takes care of failover/ failback control flow for affected services and stabilizes them across CORTX cluster.
https://github.com/Seagate/cortx
GNU Affero General Public License v3.0
4 stars 45 forks source link

CORTX-27679: on new setup if data pod scale to 0 and then 1 no online event is sent to hare #654

Closed mariyappanp closed 2 years ago

mariyappanp commented 2 years ago

Problem Statement

CORTX-27679: on new setup if data pod scale to 0 and then 1 no online event is sent to hare

Design

System Health is updating health status after comparing new event and stored event. While processing any new event, check event_type in new event is online. If new event is online, change stored event event_type to failed. Such that failed event will be reported first and then online event will be next.

Coding

Testing

kubectl scale deploy cortx-data-ssc-vm-rhev4-1667 --replicas 0

2022-03-01 20:54:09 health_monitor [7]: INFO [publish] Sending action event {'event': {'header': {'version': '1.0', 'timestamp': '1646168049', 'event_id': '1646168049a2147e3b6f394b03899cad8b6b435fac'}, 'payload': {'source': 'monitor', 'cluster_id': '266faed2854a4b758c51bfda97b27320', 'site_id': '1', 'rack_id': '1', 'storageset_id': '22b1e5a58aff44a684998c1e99b9411e', 'node_id': '22b1e5a58aff44a684998c1e99b9411e', 'resource_type': 'node', 'resource_id': '22b1e5a58aff44a684998c1e99b9411e', 'resource_status': 'failed', 'specific_info': {'generation_id': 'cortx-data-ssc-vm-rhev4-1667-9f59db655-mrmg5', 'pod_restart': 0}}}} to component hare

kubectl scale deploy cortx-data-ssc-vm-rhev4-1667 --replicas 1

2022-03-01 20:54:28 health_monitor [7]: INFO [publish] Sending action event {'event': {'header': {'version': '1.0', 'timestamp': '1646168067', 'event_id': '164616806763b9db792639441599436c82030c6ee5'}, 'payload': {'source': 'monitor', 'cluster_id': '266faed2854a4b758c51bfda97b27320', 'site_id': '1', 'rack_id': '1', 'storageset_id': '22b1e5a58aff44a684998c1e99b9411e', 'node_id': '22b1e5a58aff44a684998c1e99b9411e', 'resource_type': 'node', 'resource_id': '22b1e5a58aff44a684998c1e99b9411e', 'resource_status': 'online', 'specific_info': {'generation_id': 'cortx-data-ssc-vm-rhev4-1667-9f59db655-dzns6', 'pod_restart': 1}}}} to component hare

Review Checklist

Review Checklist

Documentation

Checklist for Author