IBM / cp4i-deployment-samples

Samples for deploying Cloud Pak for Integration capabilities in a pipeline
Apache License 2.0
22 stars 94 forks source link

Workaround for OLM problem causing operators to fail to install in parallel #305

Closed DanRoseus closed 1 year ago

DanRoseus commented 1 year ago

To workaround: 1) Changed the operator install to do it one at a time and wait for each operator 2) Created a fixup-olm.sh script that: a) Restarts the OLM catalog pod of the main OLM pod has restarted b) Deletes any subscriptions and recreates if they get in a bad state c) Deletes any orphaned CSVs (that don't have a referencing subscription) 3) Called the fixup-olm.sh script while installing operators 4) Called the fixup-olm.sh script while installing Nav

This test succeeded for 4/5 runs: image

For the failure I noticed there were general problems on the cluster with 1 of the 3 nodes not working. I think a cluster needs a minimum of 3 working nodes to function so I created a 4th node, left the dodgy node in place. The cluster then looked more healthy so I re-ran 1-click manually and it worked: image