PRB0041010 CITZ - MCS EMERALD - Node MCS-EMERALD-APP-01.DMZ rebooted - root cause analysis

BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)

Apache License 2.0

8 stars 17 forks source link

Describe the issue A problem ticket was opened in response to an incident involving an unplanned reboot of a worker node in the EMERALD cluster. Investigate and coordination with vendor support as needed to determine root cause if possible.

Blocked Until EMERALD ESXi host maintenance is complete and no new issues arise due to that, will uncordon at that point.

Additional context Add any other context, attachments or screenshots

How does this benefit the users of our platform? Ensuring root cause is addressed or otherwise confirming no issues remain from putting the affected node back into service for user workloads.

Definition of done

[x] Open vendor case (Red Hat) and upload initial data.
[x] Follow up with vendor to determine root cause if possible.

BCDevOps / developer-experience

PRB0041010 CITZ - MCS EMERALD - Node MCS-EMERALD-APP-01.DMZ rebooted - root cause analysis #5298