IBM / cloud-pak-deployer

Configuration-based installation of OpenShift and Cloud Pak for Data/Integration/Watson AIOps on various private and public cloud infrastructure providers. Deployment attempts to achieve the end-state defined in the configuration. If something fails along the way, you only need to restart the process to continue the deployment.
https://ibm.github.io/cloud-pak-deployer/
Apache License 2.0
131 stars 65 forks source link

Existing OCP destroy fails with invalid value #745

Open ibm-nilekell opened 2 weeks ago

ibm-nilekell commented 2 weeks ago

Describe the bug Existing OCP destroy fails with following error, although the installation validation passes when using global_config.cloud_platform as existing-ocp

*** 
fatal: [localhost]: FAILED! => {"changed": false, "msg": "cloud_platform existing-ocp only support values ['ibm-cloud', 'aws', 'azure'] "}

To Reproduce Steps to reproduce the behavior:

  1. Use following config yamls:

ocp-existing-ocp.yaml

---
global_config:
  environment_name: watsonx-deployment
  cloud_platform: existing-ocp
  env_id: pluto-01
  confirm_destroy: False

openshift:
- name: "{{ env_id }}"
  ocp_version: "4.14"
  cluster_name: "{{ env_id }}"
  domain_name: azure-watsonx-onprem.ibm-rh-engineroom.co.uk
  gpu:
    install: False
  openshift_ai:
    install: False
    channel: fast
  mcg:
    install: True
    storage_type: storage-class
    storage_class: managed-nfs-storage
  openshift_storage:
  - storage_name: ocs-storage
    storage_type: ocs
# Optional parameters if you want to override the storage class used
    # ocp_storage_class_file: nfs-client 
    # ocp_storage_class_block: nfs-client

watsonx-480.yaml

---
global_config:
  environment_name: watsonx-deployment
  cloud_platform: existing-ocp
  env_id: pluto-01
  confirm_destroy: False

openshift:
- name: "{{ env_id }}"
  ocp_version: "4.14"
  cluster_name: "{{ env_id }}"
  domain_name: azure-watsonx-onprem.ibm-rh-engineroom.co.uk
  gpu:
    install: False
  openshift_ai:
    install: False
    channel: fast
  mcg:
    install: True
    storage_type: storage-class
    storage_class: managed-nfs-storage
  openshift_storage:
  - storage_name: ocs-storage
    storage_type: ocs
# Optional parameters if you want to override the storage class used
    # ocp_storage_class_file: nfs-client 
    # ocp_storage_class_block: nfs-client
[devops-admin@devops-machine config]$ cat watsonx-480.yaml 
---
cp4d:
- project: cpd
  openshift_cluster_name: "{{ env_id }}"
  cp4d_version: 4.8.3
  cp4d_entitlement: watsonx-ai
  cp4d_production_license: True
  accept_licenses: False
  sequential_install: False
  db2u_limited_privileges: False
  use_fs_iam: True
  operators_project: cpd-operators
  cartridges:
  - name: cp-foundation
    license_service:
      state: disabled
      threads_per_core: 2

  - name: lite

  - name: scheduler 
    state: removed

#
# All tested cartridges. To install, change the "state" property to "installed". To uninstall, change the state
# to "removed" or comment out the entire cartridge. Make sure that the "-" and properties are aligned with the lite
# cartridge; the "-" is at position 3 and the property starts at position 5.
#

  # Please note that for watsonx.ai foundation models, you neeed to install the
  # Node Feature Discovery and NVIDIA GPU operators. You can do so by setting the openshift.gpu.install property to True
  - name: watsonx_ai
    description: watsonx.ai
    state: installed
    models:
    - model_id: google-flan-t5-xxl
      state: installed
    - model_id: google-flan-ul2
      state: installed
    - model_id: eleutherai-gpt-neox-20b
      state: installed
    - model_id: ibm-granite-13b-chat-v1
      state: installed
    - model_id: ibm-granite-13b-instruct-v1
      state: installed
    - model_id: meta-llama-llama-2-70b-chat
      state: installed
    - model_id: ibm-mpt-7b-instruct2
      state: installed
    - model_id: bigscience-mt0-xxl
      state: installed
    - model_id: bigcode-starcoder
      state: installed
  - name: watsonx_data
    description: watsonx.data
    state: installed
  1. Run ./cp-deploy.sh env apply --accept-all-licenses
  2. View logs in ./cp-deploy.sh env logs
  3. See error

Expected behavior Successful execution of destroying existing OCP with: ./cp-deploy.sh env destroy --confirm-destroy

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): N/A

Smartphone (please complete the following information): N/A

Additional context Add any other context about the problem here.

We are running our openshift cluster on Azure cloud.

fketelaars commented 2 weeks ago

We will treat this as a feature request. Today, env destroy is only supported for clusters where deployer manages the OpenShift part as well.