Closed fketelaars closed 2 years ago
The control plane CSVs were already created before the other services' subscriptions. It turns out that the WA, WD and WKS have a dependency between operators. If the CSVs are not created in the correct sequence, the OpenShift Operator Lifecycle Manager may lock up and no other operators can be installed anymore.
To check if CSV creation for subscriptions is stuck, run the following query:
export OPR_NS=ibm-common-services
for i in $(oc get sub -n $OPR_NS --sort-by=.metadata.creationTimestamp -o name); do oc get $i -n $OPR_NS -o jsonpath='{.metadata.name}{","}{.metadata.creationTimestamp}{","}{.metadata.labels}{","}{.status.installedCSV}{"\n"}'; done
If there are any subscriptions without an associated CSV, the subscriptions have likely been created in the wrong sequence. To remediate:
oc delete -n openshift-operator-lifecycle-manager $(oc get pods -n openshift-operator-lifecycle-manager -lapp=catalog-operator -o name)
oc delete -n openshift-operator-lifecycle-manager $(oc get pods -n openshift-operator-lifecycle-manager -lapp=olm-operator -o name)
To avoid the deployer running into this same issue, we will no longer use the preview.sh script to create subscriptions but let olm-utils create them. olm-utils creates the subscriptions create subscriptions one by one, waiting after each subscription until the CSV has been created.
The effect is that the overall process takes longer, but much more reliable when installing WA, WD, WKC and some of the other services.
Some CP4D services such as WA, WD and WKS have dependencies between the CSVs and are also dependent on ODLM. Instead of creating all subscriptions in 1 go, separate this for control plane and cartridges. First create subscriptions for the control plane and wait until ODLM is up, then create services for the cartridges.