IBM / Ansible-OpenShift-Provisioning

Automate the deployment of Red Hat OpenShift Container Platform on IBM zSystems (s390x). Automated User-Provisoned Infrastructure (UPI) setup using Kernel-based Virtual Machine (KVM).
https://ibm.github.io/Ansible-OpenShift-Provisioning/
MIT License
20 stars 42 forks source link

The OCP verification playbook waits for ever trying to update /etc/hosts file on controller #198

Open mohammedzee1000 opened 1 year ago

mohammedzee1000 commented 1 year ago

Ansible Controller type: Mac M1

Ansible version:

ansible --version
ansible [core 2.15.3]
  config file = /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/ansible.cfg
  configured module search path = ['/Users/mohammedzeeshanahmed/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ansible
  ansible collection location = /Users/mohammedzeeshanahmed/.ansible/collections:/usr/share/ansible/collections
  executable location = /Library/Frameworks/Python.framework/Versions/3.9/bin/ansible
  python version = 3.9.13 (v3.9.13:6de2ca5339, May 17 2022, 11:37:23) [Clang 13.0.0 (clang-1300.0.29.30)] (/Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9)
  jinja version = 3.1.2
  libyaml = True

Error:

❯ ansible-playbook playbooks/7_ocp_verification.yaml
[WARNING]: Found both group and host with same name: bastion

PLAY [7 OCP verification] *********************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************
ok: [bastion]

TASK [approve_certs : Cancel async 'approve_certs_task', if exists] ***************************************************************************************************************************
skipping: [bastion]

TASK [approve_certs : Approve all pending CSRs in the next 30 min (async task)] ***************************************************************************************************************
changed: [bastion]

TASK [check_nodes : Get and print nodes status] ***********************************************************************************************************************************************
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/common/tasks/print_ocp_node_status.yaml for bastion

TASK [check_nodes : Get OCP nodes status] *****************************************************************************************************************************************************
ok: [bastion]

TASK [check_nodes : Print OCP nodes status] ***************************************************************************************************************************************************
ok: [bastion] => {
    "oc_get_nodes.stdout_lines": [
        "NAME                     STATUS                     ROLES                  AGE     VERSION           KERNEL-VERSION                INTERNAL-IP    ",
        "ocpz-master-1            Ready                      control-plane,master   27m     v1.26.3+b404935   5.14.0-284.13.1.el9_2.s390x   192.168.122.10 ",
        "ocpz-master-2            Ready                      control-plane,master   26m     v1.26.3+b404935   5.14.0-284.13.1.el9_2.s390x   192.168.122.11 ",
        "ocpz-master-3            Ready                      control-plane,master   25m     v1.26.3+b404935   5.14.0-284.13.1.el9_2.s390x   192.168.122.12 "
    ]
}

TASK [check_nodes : Make sure control and compute nodes are 'Ready' before continuing (retry every 20s)] **************************************************************************************
changed: [bastion] => (item=ocpz-master-1)
changed: [bastion] => (item=ocpz-master-2)
changed: [bastion] => (item=ocpz-master-3)
FAILED - RETRYING: [bastion]: Make sure control and compute nodes are 'Ready' before continuing (retry every 20s) (90 retries left).
FAILED - RETRYING: [bastion]: Make sure control and compute nodes are 'Ready' before continuing (retry every 20s) (89 retries left).
FAILED - RETRYING: [bastion]: Make sure control and compute nodes are 'Ready' before continuing (retry every 20s) (88 retries left).
changed: [bastion] => (item=ocpz-compute-1)
changed: [bastion] => (item=ocpz-compute-2)

TASK [approve_certs : Cancel async 'approve_certs_task', if exists] ***************************************************************************************************************************
ok: [bastion]

TASK [approve_certs : Approve all pending CSRs in the next 30 min (async task)] ***************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Wait for cluster operators] ********************************************************************************************************************************
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_cluster_operators/tasks/check_co.yaml for bastion => (item=First)
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_cluster_operators/tasks/check_co.yaml for bastion => (item=Second)
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_cluster_operators/tasks/check_co.yaml for bastion => (item=Third)
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_cluster_operators/tasks/check_co.yaml for bastion => (item=Fourth)
included: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_cluster_operators/tasks/check_co.yaml for bastion => (item=Fifth and last)

TASK [wait_for_cluster_operators : First round of checking cluster operators] *****************************************************************************************************************
changed: [bastion]

TASK [wait_for_cluster_operators : Print cluster operators which are only in 'PROGRESSING' state] *********************************************************************************************
ok: [bastion] => {
    "oc_get_co.stdout_lines": [
        "authentication                             4.13.1    False       True          True       24m     OAuthServerDeploymentAvailable: no oauth-openshift.openshift-authentication pods available on any node....",
        "console                                    4.13.1    False       True          False      9m53s   DeploymentAvailable: 0 replicas available for console deployment...",
        "dns                                        4.13.1    True        False         True       23m     DNS default is degraded",
        "etcd                                       4.13.1    True        True          False      14m     NodeInstallerProgressing: 1 nodes are at revision 7; 2 nodes are at revision 8",
        "ingress                                              False       True          True       23m     The \"default\" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)",
        "monitoring                                           False       True          True       12m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas",
        "network                                    4.13.1    True        True          False      24m     DaemonSet \"/openshift-multus/network-metrics-daemon\" is not available (awaiting 1 nodes)..."
    ]
}

TASK [wait_for_cluster_operators : First round of waiting for cluster operators. Trying 10 times before printing status again] ****************************************************************
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (10 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (9 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (8 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (7 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (6 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (5 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (4 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (3 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (2 retries left).
FAILED - RETRYING: [bastion]: First round of waiting for cluster operators. Trying 10 times before printing status again (1 retries left).
fatal: [bastion]: FAILED! => {"attempts": 10, "changed": true, "cmd": "set -o pipefail\n# Check for 'PROGRESSING' state\noc get co 2> /dev/null | awk '{print $4}'\n", "delta": "0:00:00.124020", "end": "2023-09-11 01:26:10.825538", "msg": "", "rc": 0, "start": "2023-09-11 01:26:10.701518", "stderr": "", "stderr_lines": [], "stdout": "PROGRESSING\nTrue\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nTrue\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nTrue\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse\nFalse", "stdout_lines": ["PROGRESSING", "True", "False", "False", "False", "False", "False", "False", "False", "False", "False", "False", "False", "False", "False", "True", "False", "False", "False", "False", "False", "False", "False", "True", "False", "False", "False", "False", "False", "False", "False", "False", "False", "False"]}
...ignoring

TASK [wait_for_cluster_operators : Update local variable, if required] ************************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Second round of checking cluster operators] ****************************************************************************************************************
changed: [bastion]

TASK [wait_for_cluster_operators : Print cluster operators which are only in 'PROGRESSING' state] *********************************************************************************************
ok: [bastion] => {
    "oc_get_co.stdout_lines": [
        "authentication                             4.13.1    False       True          False      30m     WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://192.168.122.12:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)",
        "kube-apiserver                             4.13.1    True        True          False      19m     NodeInstallerProgressing: 2 nodes are at revision 5; 1 nodes are at revision 6",
        "monitoring                                           False       True          True       40s     reconciling Telemeter client cluster monitoring view ClusterRoleBinding failed: creating ClusterRoleBinding object failed: clusterroles.rbac.authorization.k8s.io \"cluster-monitoring-view\" not found, reconciling Prometheus Alertmanager RoleBinding failed: creating RoleBinding object failed: roles.rbac.authorization.k8s.io \"monitoring-alertmanager-edit\" not found, prometheuses.monitoring.coreos.com \"k8s\" not found"
    ]
}

TASK [wait_for_cluster_operators : Second round of waiting for cluster operators. Trying 10 times before printing status again] ***************************************************************
FAILED - RETRYING: [bastion]: Second round of waiting for cluster operators. Trying 10 times before printing status again (10 retries left).
FAILED - RETRYING: [bastion]: Second round of waiting for cluster operators. Trying 10 times before printing status again (9 retries left).
FAILED - RETRYING: [bastion]: Second round of waiting for cluster operators. Trying 10 times before printing status again (8 retries left).
FAILED - RETRYING: [bastion]: Second round of waiting for cluster operators. Trying 10 times before printing status again (7 retries left).
FAILED - RETRYING: [bastion]: Second round of waiting for cluster operators. Trying 10 times before printing status again (6 retries left).
changed: [bastion]

TASK [wait_for_cluster_operators : Update local variable, if required] ************************************************************************************************************************
ok: [bastion]

TASK [wait_for_cluster_operators : Third round of checking cluster operators] *****************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Print cluster operators which are only in 'PROGRESSING' state] *********************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Third round of waiting for cluster operators. Trying 10 times before printing status again] ****************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Update local variable, if required] ************************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Fourth round of checking cluster operators] ****************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Print cluster operators which are only in 'PROGRESSING' state] *********************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Fourth round of waiting for cluster operators. Trying 10 times before printing status again] ***************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Update local variable, if required] ************************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Fifth and last round of checking cluster operators] ********************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Print cluster operators which are only in 'PROGRESSING' state] *********************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Fifth and last round of waiting for cluster operators. Trying 10 times before printing status again] *******************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Update local variable, if required] ************************************************************************************************************************
skipping: [bastion]

TASK [wait_for_cluster_operators : Get final cluster operators] *******************************************************************************************************************************
ok: [bastion]

TASK [wait_for_cluster_operators : Print final cluster operators] *****************************************************************************************************************************
ok: [bastion] => {
    "oc_get_co.stdout_lines": [
        "NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE",
        "authentication                             4.13.1    True        False         False      32s     ",
        "baremetal                                  4.13.1    True        False         False      32m     ",
        "cloud-controller-manager                   4.13.1    True        False         False      37m     ",
        "cloud-credential                           4.13.1    True        False         False      38m     ",
        "cluster-autoscaler                         4.13.1    True        False         False      32m     ",
        "config-operator                            4.13.1    True        False         False      33m     ",
        "console                                    4.13.1    True        False         False      5m57s   ",
        "control-plane-machine-set                  4.13.1    True        False         False      32m     ",
        "csi-snapshot-controller                    4.13.1    True        False         False      33m     ",
        "dns                                        4.13.1    True        False         False      32m     ",
        "etcd                                       4.13.1    True        False         False      23m     ",
        "image-registry                             4.13.1    True        False         False      18m     ",
        "ingress                                    4.13.1    True        False         False      8m50s   ",
        "insights                                   4.13.1    True        False         False      26m     ",
        "kube-apiserver                             4.13.1    True        False         False      22m     ",
        "kube-controller-manager                    4.13.1    True        False         False      22m     ",
        "kube-scheduler                             4.13.1    True        False         False      23m     ",
        "kube-storage-version-migrator              4.13.1    True        False         False      33m     ",
        "machine-api                                4.13.1    True        False         False      32m     ",
        "machine-approver                           4.13.1    True        False         False      32m     ",
        "machine-config                             4.13.1    True        False         False      31m     ",
        "marketplace                                4.13.1    True        False         False      32m     ",
        "monitoring                                 4.13.1    True        False         False      2m33s   ",
        "network                                    4.13.1    True        False         False      33m     ",
        "node-tuning                                4.13.1    True        False         False      9m19s   ",
        "openshift-apiserver                        4.13.1    True        False         False      19m     ",
        "openshift-controller-manager               4.13.1    True        False         False      22m     ",
        "openshift-samples                          4.13.1    True        False         False      18m     ",
        "operator-lifecycle-manager                 4.13.1    True        False         False      33m     ",
        "operator-lifecycle-manager-catalog         4.13.1    True        False         False      33m     ",
        "operator-lifecycle-manager-packageserver   4.13.1    True        False         False      20m     ",
        "service-ca                                 4.13.1    True        False         False      33m     ",
        "storage                                    4.13.1    True        False         False      33m     "
    ]
}

TASK [wait_for_install_complete : Almost there! Add host info to /etc/hosts so you can login to the cluster via web browser. Ansible Controller sudo password required] ***********************

^C [ERROR]: User interrupted execution

What is happening? The final stage of the playbook has a role called wait_for_install_complete, has a task that tries to patch the /etc/hosts file with host information to access the cluster.

https://github.com/IBM/Ansible-OpenShift-Provisioning/blob/7e7c49781ef17fa3161becb0558cc005a197a147/roles/wait_for_install_complete/tasks/main.yaml#L2-L14

For reasons unknown as of yet, the playbook pauses here and does not proceed further from this task and needs to be interrupted by me manually.

This could cause issues if it happens in automation such as a Jenkins job or a Tekton pipleine.

We should add a timeout for this op while ignoring any errors and even print this information and ask the user to validate this and add it to his hosts file manually if it's not there.

jacobemery commented 1 year ago

Hi @mohammedzee1000, this task is required to access the cluster via web browser on your workstation. If you don't need that, then this task can be commented out. Adding a timeout is great idea. Not sure why it is hanging, I'd think it would fail out. Have you filled in the env.controller.sudo_pass variable? I'd think it would fail out pretty quickly if you didn't though.

Could you re-run the task with extra verbosity (-vvv) and post it here to see if we can dig in more to what's going on?

mohammedzee1000 commented 1 year ago

Hi sure, let me post the logs with verbosity enabled. Also yes, i have set env.contoller.sudo_pass it is marked with an #x in the template after all :D

mohammedzee1000 commented 1 year ago

all.yaml:

# Copy this file to 'all.yaml' in the same folder and add your required values there !
#
# For a comprehensive description of each variable, please see documentation here:
# https://ibm.github.io/Ansible-OpenShift-Provisioning/set-variables-group-vars/

# Section 1 - Ansible Controller
env:
  controller:
    sudo_pass: <redacted>

# Section 2 - LPAR(s)
  z:
    high_availability: False
    ip_forward: 1 
    lpar1:
      create: True
      hostname: <redacted>
      ip: <redacted>
      user: root
      pass: <redacted>
    lpar2:
      create: False
#      hostname:
#      ip:
#      user:
#      pass:
    lpar3:
      create: False
#      hostname:
#      ip:
#      user:
#      pass:

# Section 3 - File Server
  file_server:
    ip: 192.168.122.1
    user: root
    pass: <redacted>.
    protocol: http
    iso_mount_dir: bastioniso
    cfgs_dir: cfgs

# Section 4 - Red Hat
  redhat:
    username: <redacted
    password: <redacted>
    # Make sure to enclose pull_secret in 'single quotes'
    pull_secret: '<redacted>'

# Section 5 - Bastion
  bastion:
    create: True
    vm_name: ocpz-bastion
    resources:
      disk_size: 30
      ram: 4096
      swap: 4096
      vcpu: 4
    networking:
      ip: 192.168.122.5
      hostname:   bastion
      base_domain: ocpz.<redacted>
      subnetmask: 255.255.255.0
      gateway: 192.168.122.1
      nameserver1: 192.168.122.5
      nameserver2: 192.168.122.1
      forwarder: 1.1.1.1
      interface: enc1
    access:
      user: root
      pass: <redacted>
      root_pass: <redacted>
    options:
      dns: True
      loadbalancer:
        on_bastion: True
#        public_ip:
#        private_ip:

# Section 6 - Cluster Networking
  cluster:
    networking:
      metadata_name: ocpz
      base_domain: ocpz.<redacted>
      subnetmask: 255.255.255.0
      gateway: 192.168.122.1
      nameserver1: 192.168.122.5
#      nameserver2:
      forwarder: 1.1.1.1

# Section 7 - Bootstrap Node
    nodes:
      bootstrap:
        disk_size: 120
        ram: 16384
        vcpu: 4
        vm_name: ocpz-bootstrap
        ip: 192.168.122.6
        hostname: ocpz-bootstrap

# Section 8 - Control Nodes
      control:
        disk_size: 120
        ram: 16384
        vcpu: 4
        vm_name:
          - ocpz-master-1
          - ocpz-master-2
          - ocpz-master-3
        ip:
          - 192.168.122.10
          - 192.168.122.11
          - 192.168.122.12
        hostname:
          - ocpz-master-1
          - ocpz-master-2
          - ocpz-master-3

# Section 9 - Compute Nodes
      compute:
        disk_size: 120
        ram: 16384
        vcpu: 4
        vm_name:
          - ocpz-compute-1
          - ocpz-compute-2
        ip:
          - 192.168.122.20
          - 192.168.122.21
        hostname:
          - ocpz-compute-1
          - ocpz-compute-2

# Section 10 - Infra Nodes
#      infra:
#        disk_size: 120
#        ram: 16384
#        vcpu: 4
#        vm_name:
#          - infra-1
#          - infra-2
#        ip:
#          - 1.1.1.1
#          - 1.1.1.2
#        hostname:
#          - infra1
#          - infra2

#######################################################################################
# All variables below this point do not need to be changed for a default installation #
#######################################################################################

# Section 11 - (Optional) Packages
  pkgs:
    galaxy: [ ibm.ibm_zhmc, community.general, community.crypto, ansible.posix, community.libvirt ]
    controller: [ openssh, expect, sshuttle ]
    kvm: [ libguestfs, libvirt-client, libvirt-daemon-config-network, libvirt-daemon-kvm, cockpit-machines, libvirt-devel, virt-top, qemu-kvm, python3-lxml, cockpit, lvm2 ]
    bastion: [ haproxy, httpd, bind, bind-utils, expect, firewalld, mod_ssl, python3-policycoreutils, rsync ]
    hypershift: [ make, jq, git, virt-install ]

# Section 12 - OpenShift Settings
  install_config:
    api_version: v1
    compute:
      architecture: s390x
      hyperthreading: Enabled
    control:
      architecture: s390x
      hyperthreading: Enabled
    cluster_network:
      cidr: 10.128.0.0/14
      host_prefix: 23
      type: OVNKubernetes
    service_network: 172.30.0.0/16
    fips: 'false'

# Section 13 - (Optional) Proxy
#  proxy:
#    http:
#    https:
#    no:

# Section 14 - (Optional) Misc
  language: en_US.UTF-8
  timezone: America/New_York
  keyboard: us
  root_access: false
  ansible_key_name: ansible-ocpz
  ocp_ssh_key_comment: OpenShift key
  bridge_name: default
  network_mode: NAT

#jumphost if network mode is NAT
  jumphost:
    name: <redacted>
    ip: <redacted>
    user: root
    pass: <redacted>
    path_to_keypair: /root/.ssh/id_rsa.pub

# Section 15 - OCP and RHCOS (CoreOS)

# ocp_download_url with '/' at the end !
ocp_download_url: "https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.13.1/s390x/"
# ocp client and installer filenames
ocp_client_tgz: "openshift-client-linux.tar.gz"
ocp_install_tgz: "openshift-install-linux.tar.gz"

# rhcos_download_url with '/' at the end !
rhcos_download_url: "https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.12/4.12.3/"

# For rhcos_os_variant use the OS string as defined in 'osinfo-query os -f short-id'
rhcos_os_variant: rhel8.6

# RHCOS live image filenames
rhcos_live_kernel: "rhcos-4.12.3-s390x-live-kernel-s390x"
rhcos_live_initrd: "rhcos-4.12.3-s390x-live-initramfs.s390x.img"
rhcos_live_rootfs: "rhcos-4.12.3-s390x-live-rootfs.s390x.img"

# Section 16 - Hypershift ( Optional )

hypershift:
  kvm_host:
  kvm_host_user:
  bastion_hypershift:
  bastion_hypershift_user:

  create_bastion: true
  networking_device: enc1100
  gateway:

  bastion_parms:
    interface:
    hostname:
    base_domain:
    os_variant:
    nameserver:
    gateway:
    subnet_mask:

  mgmt_cluster_nameserver:
  oc_url:

  #Hosted Control Plane Parameters

  hcp:
    clusters_namespace:
    hosted_cluster_name:
    basedomain:
    pull_secret_file: /root/ansible_workdir/auth_file
    ocp_release:
    machine_cidr: 192.168.122.0/24
    arch:
    # Make sure to enclose pull_secret in 'single quotes'
    pull_secret:

  # MultiClusterEngine Parameters
  mce:
    version:
    instance_name: engine
    delete: false

  # AgentServiceConfig Parameters

  asc:
    url_for_ocp_release_file:
    db_volume_size: "10Gi"
    fs_volume_size: "10Gi"
    ocp_version:
    iso_url:
    root_fs_url:
    mce_namespace: multicluster-engine # This is the Recommended Namespace for Multicluster Engine operator

  agents_parms:
    static_ip_parms:
      static_ip: true
      ip:     # Required only if static_ip is true
        #-
        #-
      interface: eth0
    agents_count:
    # If you want to use specific mac addresses, provide them here
    agent_mac_addr:
      #-
    disk_size: 100G
    ram: 16384
    vcpus: 4
    nameserver:

# Section 17 - (Optional) Create additional compute node in a day-2 operation

day2_compute_node:
  vm_name:
  vm_hostname:
  vm_ip:
  hostname:
  host_arch:
path_to_key_pair: <redacted>/.ssh/ansible-ocpz.pub
TASK [wait_for_install_complete : Almost there! Add host info to /etc/hosts so you can login to the cluster via web browser. Ansible Controller sudo password required] ***********************
task path: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_install_complete/tasks/main.yaml:2
Read vars_file '{{ inventory_dir }}/group_vars/all.yaml'
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: mohammedzeeshanahmed
<127.0.0.1> EXEC /bin/sh -c 'echo ~mohammedzeeshanahmed && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/mohammedzeeshanahmed/.ansible/tmp `"&& mkdir "` echo /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192 `" && echo ansible-tmp-1694586065.7199008-6133-260583548742192="` echo /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192 `" ) && sleep 0'
Including module_utils file ansible/__init__.py
Including module_utils file ansible/module_utils/__init__.py
Including module_utils file ansible/module_utils/_text.py
Including module_utils file ansible/module_utils/basic.py
Including module_utils file ansible/module_utils/common/_json_compat.py
Including module_utils file ansible/module_utils/common/__init__.py
Including module_utils file ansible/module_utils/common/_utils.py
Including module_utils file ansible/module_utils/common/arg_spec.py
Including module_utils file ansible/module_utils/common/file.py
Including module_utils file ansible/module_utils/common/locale.py
Including module_utils file ansible/module_utils/common/parameters.py
Including module_utils file ansible/module_utils/common/collections.py
Including module_utils file ansible/module_utils/common/process.py
Including module_utils file ansible/module_utils/common/sys_info.py
Including module_utils file ansible/module_utils/common/text/converters.py
Including module_utils file ansible/module_utils/common/text/__init__.py
Including module_utils file ansible/module_utils/common/text/formatters.py
Including module_utils file ansible/module_utils/common/validation.py
Including module_utils file ansible/module_utils/common/warnings.py
Including module_utils file ansible/module_utils/compat/selectors.py
Including module_utils file ansible/module_utils/compat/__init__.py
Including module_utils file ansible/module_utils/compat/_selectors2.py
Including module_utils file ansible/module_utils/compat/selinux.py
Including module_utils file ansible/module_utils/distro/__init__.py
Including module_utils file ansible/module_utils/distro/_distro.py
Including module_utils file ansible/module_utils/errors.py
Including module_utils file ansible/module_utils/parsing/convert_bool.py
Including module_utils file ansible/module_utils/parsing/__init__.py
Including module_utils file ansible/module_utils/pycompat24.py
Including module_utils file ansible/module_utils/six/__init__.py
<bastion> Attempting python interpreter discovery
<127.0.0.1> EXEC /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'python3.11'"'"'; command -v '"'"'python3.10'"'"'; command -v '"'"'python3.9'"'"'; command -v '"'"'python3.8'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
<bastion> Python interpreter discovery fallback (unsupported platform for extended discovery: darwin)
Using module file /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ansible/modules/blockinfile.py
<127.0.0.1> PUT /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-local-5177qxhnxjem/tmp5hgeqnms TO /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/ /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'sudo -H -S -p "[sudo via ansible, key=jnbasqfpqraesiphksattiwxoxwzftsa] password:" -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-jnbasqfpqraesiphksattiwxoxwzftsa ; KUBECONFIG=/root/.kube/config /opt/homebrew/bin/python3.11 /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py'"'"' && sleep 0'

It is currently hung at this point but still to be safe il let it run a little bit longer. While I doubt anything will move forward, I will paste more if it changes. Otherwise, this is it

jacobemery commented 1 year ago

Hmm, I'm not 100% sure. Maybe because it is 'becoming' the root user, it is having problems finding either your Python or Ansible binary? I guess it should really 'become' the same user, just with elevated privileges, instead of root.

What do you think? Could you check which python3 as the root user and see if it returns the correct path? Or we could add a 'become_user' to the task.