IBM / cloud-pak-deployer

Configuration-based installation of OpenShift and Cloud Pak for Data/Integration/Watson AIOps on various private and public cloud infrastructure providers. Deployment attempts to achieve the end-state defined in the configuration. If something fails along the way, you only need to restart the process to continue the deployment.
https://ibm.github.io/cloud-pak-deployer/
Apache License 2.0
139 stars 68 forks source link

Avoid lock-up of OLM by separating control plane from cartridges #158

Closed fketelaars closed 2 years ago

fketelaars commented 2 years ago

Some CP4D services such as WA, WD and WKS have dependencies between the CSVs and are also dependent on ODLM. Instead of creating all subscriptions in 1 go, separate this for control plane and cartridges. First create subscriptions for the control plane and wait until ODLM is up, then create services for the cartridges.

fketelaars commented 2 years ago

The control plane CSVs were already created before the other services' subscriptions. It turns out that the WA, WD and WKS have a dependency between operators. If the CSVs are not created in the correct sequence, the OpenShift Operator Lifecycle Manager may lock up and no other operators can be installed anymore.

To check if CSV creation for subscriptions is stuck, run the following query:

export OPR_NS=ibm-common-services
for i in $(oc get sub -n $OPR_NS --sort-by=.metadata.creationTimestamp -o name);   do   oc get $i -n $OPR_NS -o jsonpath='{.metadata.name}{","}{.metadata.creationTimestamp}{","}{.metadata.labels}{","}{.status.installedCSV}{"\n"}'; done

If there are any subscriptions without an associated CSV, the subscriptions have likely been created in the wrong sequence. To remediate:

oc delete -n openshift-operator-lifecycle-manager $(oc get pods -n openshift-operator-lifecycle-manager -lapp=catalog-operator -o name)
oc delete -n openshift-operator-lifecycle-manager $(oc get pods -n openshift-operator-lifecycle-manager -lapp=olm-operator -o name)

To avoid the deployer running into this same issue, we will no longer use the preview.sh script to create subscriptions but let olm-utils create them. olm-utils creates the subscriptions create subscriptions one by one, waiting after each subscription until the CSV has been created.

The effect is that the overall process takes longer, but much more reliable when installing WA, WD, WKC and some of the other services.