Seagate / cortx-ha

CORTX ha (High-Availability) is responsible for ensuring that CORTX Solution is available in case of any hardware component or software service failures. It takes care of failover/ failback control flow for affected services and stabilizes them across CORTX cluster.
https://github.com/Seagate/cortx
GNU Affero General Public License v3.0
4 stars 45 forks source link

CORTX-28171 : System health state machine : Implementation #656

Closed mukhtar-inamdar closed 2 years ago

mukhtar-inamdar commented 2 years ago

Problem Statement

Design

Coding

Testing

Review Checklist

Review Checklist

Documentation

Checklist for Author

ArchanaLimaye commented 2 years ago

Regarding point mentioned in design : "and check if pass till system_health creation" In case of state change, are these events getting published on the message bus to health_monitor ? Please confirm if not already done

2022-03-03 19:08:38 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/failed 2022-03-03 19:08:38 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/failed 2022-03-03 19:08:39 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/online 2022-03-03 19:08:39 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/online 2022-03-03 19:08:40 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/repairing 2022-03-03 19:08:40 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/repairing 2022-03-03 19:08:41 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/repaired 2022-03-03 19:08:41 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/repaired 2022-03-03 19:08:42 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/rebalancing 2022-03-03 19:08:42 health_monitor [7]: INFO [evaluate] Evaluated action [] for key action/disk/rebalancing