IBM / cloud-pak-deployer

Configuration-based installation of OpenShift and Cloud Pak for Data/Integration/Watson AIOps on various private and public cloud infrastructure providers. Deployment attempts to achieve the end-state defined in the configuration. If something fails along the way, you only need to restart the process to continue the deployment.
https://ibm.github.io/cloud-pak-deployer/
Apache License 2.0
131 stars 67 forks source link

Deployer fails to deploy during ROKS on IBM Cloud with NFS Storage #546

Open patcurtin opened 10 months ago

patcurtin commented 10 months ago

Describe the bug A clear and concise description of what the bug is. When trying to use Deployer to create a ROKS OpenShift Cluster on IBM Cloud to use NFS, 2 servers are created, 1 Bastion Server and 1 NFS Server. During the install these Servers need Python installed and selinux disabled or the deployer fails and exits

To Reproduce Steps to reproduce the behavior:

  1. Follow the normal IBM Cloud instructions from here https://ibm.github.io/cloud-pak-deployer/10-use-deployer/3-run/ibm-cloud/
  2. Use this config file (which is pretty much a copy of the sample in the repo, use any CP4D file, it doesn't matter) :
    
    ---
    global_config:
    environment_name: sample
    cloud_platform: ibm-cloud
    ibm_cloud_region: eu-de
    env_id: nfs-test
    confirm_destroy: False

provider:

resource_group:

ssh_keys:

security_rule:

vpc:

address_prefix:

subnet:

vsi:

nfs_server:

cos:

openshift:

  1. Run the deployer

Expected behavior OpenShift Cluster should be created with NFS Stroage

Screenshots

First error in the install :

TASK [Configure bastion servers] ***********************************************
Tuesday 03 October 2023  10:55:36 +0000 (0:00:00.055)       0:51:28.655 *******

TASK [nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node] ***
Tuesday 03 October 2023  10:55:36 +0000 (0:00:00.041)       0:51:28.696 *******
fatal: [149.81.12.75]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 149.81.12.75 closed.\r\n", "module_stdout": "/bin/sh: /usr/local/bin/python: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}

PLAY RECAP *********************************************************************
149.81.12.75               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
localhost                  : ok=831  changed=69   unreachable=0    failed=0    skipped=326  rescued=0    ignored=0

Tuesday 03 October 2023  10:55:38 +0000 (0:00:01.533)       0:51:30.230 *******
===============================================================================
provision-terraform : Run terraform apply in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/apply.log  2822.09s
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 45.85s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.29s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 25.22s
cloudctl-download : Download cloudctl tool ----------------------------- 11.92s
ibm-pak-download : Download ibm-pak plugin ------------------------------ 8.89s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 8.81s
cpd-cli-download : Download latest cpd-cli release ---------------------- 5.74s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.57s
terraform-download : Get Terraform version ------------------------------ 3.35s
openshift-download-client : Download OpenShift client "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.12/openshift-client-linux.tar.gz" --- 3.17s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.30s
cpd-cli-download : Get current version number of cpd-cli ---------------- 2.02s
terraform-download : Download Terraform --------------------------------- 2.01s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.00s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.53s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.44s
generators : Create SSH key if not already in the vault and managed ----- 1.23s
cloudctl-download : Get current version number of clouctl --------------- 1.12s
vault-set-secret : Create directory /home/pat/cpd-status/vault if not existent --- 1.09s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into bastion [root@nfs-test-bastion ~]# yum install python36 [root@nfs-test-bastion ~]# ln -s /usr/bin/python3.6 /usr/local/bin/python

Restart the deployer

Second ERROR

TASK [Configure bastion servers] ***********************************************
Tuesday 03 October 2023  14:34:03 +0000 (0:00:00.071)       0:04:05.803 *******

TASK [nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node] ***
Tuesday 03 October 2023  14:34:03 +0000 (0:00:00.039)       0:04:05.842 *******
fatal: [149.81.12.75]: FAILED! => {"changed": false, "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"}

PLAY RECAP *********************************************************************
149.81.12.75               : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
localhost                  : ok=774  changed=51   unreachable=0    failed=0    skipped=323  rescued=0    ignored=0

Tuesday 03 October 2023  14:34:04 +0000 (0:00:01.597)       0:04:07.440 *******
===============================================================================
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 45.53s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 43.54s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.92s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 9.05s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.21s
terraform-download : Get Terraform version ------------------------------ 3.35s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.30s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.00s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.60s
ibm-pak-download : Make sure ibm-pak can be run within path ------------- 1.50s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.48s
record-deployer-state : Make sure old deployer-state.out does not exist --- 1.34s
cpd-cli-download : Check if cpdcli was already downloaded --------------- 1.03s
vault-get-secret : Check that vault file sample exists ------------------ 1.02s
record-deployer-state : Starting background task to record deployer state in /home/pat/cpd-status/log --- 0.89s
lint-config : filter the vault variables from ansible variables --------- 0.82s
merge-config : Generate config through template ------------------------- 0.82s
lint-config : Run the linter and pre-processor script for object provider --- 0.73s
generators : Generate instance of "vpc" in /home/pat/cpd-status/terraform --- 0.72s
merge-config : Get stats of /home/pat/cpd-config/config ----------------- 0.71s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into bastion [root@nfs-test-bastion ~]# vi /etc/selinux/config Set SELINUX=disabled [root@nfs-test-bastion ~]# reboot

Restart the deployer

Third ERROR

TASK [Configure NFS servers] ***************************************************
Tuesday 03 October 2023  14:58:17 +0000 (0:00:02.801)       0:04:16.094 *******

TASK [nfs-server-ibmcloud-vpc-install : Format the NFS volume] *****************
Tuesday 03 October 2023  14:58:17 +0000 (0:00:00.061)       0:04:16.156 *******
included: /cloud-pak-deployer/automation-roles/40-configure-infra/nfs-server-ibmcloud-vpc-install/tasks/prepare_xfs_volume.yaml for 10.231.0.197

TASK [nfs-server-ibmcloud-vpc-install : Get volume for specified selector of 1000G] ***
Tuesday 03 October 2023  14:58:18 +0000 (0:00:00.071)       0:04:16.227 *******
fatal: [10.231.0.197]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 10.231.0.197 closed.\r\n", "module_stdout": "/bin/sh: /usr/local/bin/python: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127}

PLAY RECAP *********************************************************************
10.231.0.197               : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
149.81.12.75               : ok=2    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
localhost                  : ok=774  changed=51   unreachable=0    failed=0    skipped=323  rescued=0    ignored=0

Tuesday 03 October 2023  14:58:19 +0000 (0:00:01.900)       0:04:18.128 *******
===============================================================================
provision-terraform : Run terraform init in Terraform directory /home/pat/cpd-status/terraform -- 46.35s
provision-terraform : Run terraform plan in Terraform directory /home/pat/cpd-status/terraform, check /home/pat/cpd-status/terraform/plan.log -- 42.98s
download-ibmcloud : Run ibmcloud installer ----------------------------- 36.56s
cpd-cli-download : Unpack cpd-cli from /home/pat/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz --- 9.65s
openshift-download-client : Unpack OpenShift client from /home/pat/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 4.34s
terraform-download : Get Terraform version ------------------------------ 3.39s
nfs-server-ibmcloud-vpc-bastion : Restart sshd service ------------------ 2.80s
ibm-pak-download : Extract ibm-pak from /home/pat/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 2.46s
cloudctl-download : Unpack cloudctl from /home/pat/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 2.08s
nfs-server-ibmcloud-vpc-install : Get volume for specified selector of 1000G --- 1.90s
terraform-download : Unpack Terraform from /home/pat/cpd-status/downloads/terraform_linux_amd64.zip --- 1.50s
nfs-server-ibmcloud-vpc-bastion : Enable TCP forwarding on bastion node --- 1.44s
generators : Generate instance of "provider" in /home/pat/cpd-status/terraform/provider_ibm.tf --- 1.31s
record-deployer-state : Make sure old deployer-state.out does not exist --- 1.11s
ibm-pak-download : Make sure ibm-pak can be run within path ------------- 1.11s
generators : Generate instance of "resource_group" in /home/pat/cpd-status/terraform/resource_group_default.tf --- 1.03s
generators : Generate instance of "vpc" in /home/pat/cpd-status/terraform --- 0.92s
generators : Generate instance of "vsi" in /home/pat/cpd-status/terraform/vsi_nfs-test-bastion.tf --- 0.87s
generators : Generate instance of "address_prefix" in /home/pat/cpd-status/terraform/address_prefix_nfs-test-zone.tf --- 0.81s
merge-config : Generate config through template ------------------------- 0.79s

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press ^C.

SSH into nfs node [root@nfs-test-nfs ~]# yum install python36 [root@nfs-test-nfs ~]# ln -s /usr/bin/python3.6 /usr/local/bin/python [root@nfs-test-nfs ~]# vi /etc/selinux/config Set SELINUX=disabled [root@nfs-test-nfs ~]# reboot

Install Completes Successfully this time

Note that Python 3.8 fails due to a python-dnf issue, so python 3.6 was used.

fketelaars commented 10 months ago

@patcurtin This is more or less a catch-22. When we initially designed the deployer framework, the virtual server images on IBM Cloud were pre-installed with Python, which is a requirement for Ansible. We can try installing Python on the bastion and NFS server and then install selinux using ssh and then continue, but I would rather spend the effort on using the VPC file server capability that is now available on IBM Cloud.

Effectively:

https://registry.terraform.io/providers/IBM-Cloud/ibm/latest/docs/resources/is_share

Consequently, deployer would no longer need a bastion server, except when a private cluster is deployed. Also, the NFS becomes an extendable file server with fewer restrictions.