BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

SDN - Investigate vSphereOpenshiftClusterHealthFail errors from AlertManager in KLAB2 #4289

Open wmhutchison opened 12 months ago

wmhutchison commented 12 months ago

Describe the issue Due to expired AlertManager alerts going away far too fast, it's not certain if this is a known issue whose silence expired during Openshift upgrades to 4.12, or if this is a net-new issue. Not a high priority since this alert is for integration required if we were to make use of ESXi storageclass (we do not as we use NetApp). Still, would be good to dig into this as time permits and resolve.

Additional context Add any other context, attachments or screenshots

How does this benefit the users of our platform?

Definition of done

wmhutchison commented 11 months ago

Link to OCP 4.12 requirements for vSphere account permissions. https://docs.openshift.com/container-platform/4.12/installing/installing_vsphere/installing-vsphere.html#installation-vsphere-installer-infra-requirements_installing-vsphere

wmhutchison commented 3 months ago

Updated link for OCP 4.14: https://docs.openshift.com/container-platform/4.14/installing/installing_vsphere/upi/upi-vsphere-installation-reqs.html#installation-vsphere-installer-infra-requirements_upi-vsphere-installation-reqs

StevenBarre commented 1 week ago

4.14 requires additional permissions for the service account that the vsphere operator uses to talk to vcenter. Added those extra things and its has stopped complaining.

Also ran the fix in https://access.redhat.com/solutions/7057384

oc patch clustercsidriver --subresource=status csi.vsphere.vmware.com  --type='merge' -p '{"status": {"conditions": [ {"type": "VMwareVSphereControllerDummyProgressing", "status": "False", "reason": "Forced"}]}}'
StevenBarre commented 1 week ago

https://github.dxc.com/AdvSol/mcs-ansible-tower/pull/27 adds the vCenter CA certs to the util servers so we can run openshift-install interactively and generate the install-config.yaml with updated values.

StevenBarre commented 4 days ago

Waiting on RITM0192009

StevenBarre commented 2 days ago

Firewall change scheduled for Monday night