Open mohammedzee1000 opened 1 year ago
Hi @mohammedzee1000, this task is required to access the cluster via web browser on your workstation. If you don't need that, then this task can be commented out. Adding a timeout is great idea. Not sure why it is hanging, I'd think it would fail out. Have you filled in the env.controller.sudo_pass variable? I'd think it would fail out pretty quickly if you didn't though.
Could you re-run the task with extra verbosity (-vvv) and post it here to see if we can dig in more to what's going on?
Hi sure, let me post the logs with verbosity enabled.
Also yes, i have set env.contoller.sudo_pass it is marked with an #x
in the template after all :D
all.yaml:
# Copy this file to 'all.yaml' in the same folder and add your required values there !
#
# For a comprehensive description of each variable, please see documentation here:
# https://ibm.github.io/Ansible-OpenShift-Provisioning/set-variables-group-vars/
# Section 1 - Ansible Controller
env:
controller:
sudo_pass: <redacted>
# Section 2 - LPAR(s)
z:
high_availability: False
ip_forward: 1
lpar1:
create: True
hostname: <redacted>
ip: <redacted>
user: root
pass: <redacted>
lpar2:
create: False
# hostname:
# ip:
# user:
# pass:
lpar3:
create: False
# hostname:
# ip:
# user:
# pass:
# Section 3 - File Server
file_server:
ip: 192.168.122.1
user: root
pass: <redacted>.
protocol: http
iso_mount_dir: bastioniso
cfgs_dir: cfgs
# Section 4 - Red Hat
redhat:
username: <redacted
password: <redacted>
# Make sure to enclose pull_secret in 'single quotes'
pull_secret: '<redacted>'
# Section 5 - Bastion
bastion:
create: True
vm_name: ocpz-bastion
resources:
disk_size: 30
ram: 4096
swap: 4096
vcpu: 4
networking:
ip: 192.168.122.5
hostname: bastion
base_domain: ocpz.<redacted>
subnetmask: 255.255.255.0
gateway: 192.168.122.1
nameserver1: 192.168.122.5
nameserver2: 192.168.122.1
forwarder: 1.1.1.1
interface: enc1
access:
user: root
pass: <redacted>
root_pass: <redacted>
options:
dns: True
loadbalancer:
on_bastion: True
# public_ip:
# private_ip:
# Section 6 - Cluster Networking
cluster:
networking:
metadata_name: ocpz
base_domain: ocpz.<redacted>
subnetmask: 255.255.255.0
gateway: 192.168.122.1
nameserver1: 192.168.122.5
# nameserver2:
forwarder: 1.1.1.1
# Section 7 - Bootstrap Node
nodes:
bootstrap:
disk_size: 120
ram: 16384
vcpu: 4
vm_name: ocpz-bootstrap
ip: 192.168.122.6
hostname: ocpz-bootstrap
# Section 8 - Control Nodes
control:
disk_size: 120
ram: 16384
vcpu: 4
vm_name:
- ocpz-master-1
- ocpz-master-2
- ocpz-master-3
ip:
- 192.168.122.10
- 192.168.122.11
- 192.168.122.12
hostname:
- ocpz-master-1
- ocpz-master-2
- ocpz-master-3
# Section 9 - Compute Nodes
compute:
disk_size: 120
ram: 16384
vcpu: 4
vm_name:
- ocpz-compute-1
- ocpz-compute-2
ip:
- 192.168.122.20
- 192.168.122.21
hostname:
- ocpz-compute-1
- ocpz-compute-2
# Section 10 - Infra Nodes
# infra:
# disk_size: 120
# ram: 16384
# vcpu: 4
# vm_name:
# - infra-1
# - infra-2
# ip:
# - 1.1.1.1
# - 1.1.1.2
# hostname:
# - infra1
# - infra2
#######################################################################################
# All variables below this point do not need to be changed for a default installation #
#######################################################################################
# Section 11 - (Optional) Packages
pkgs:
galaxy: [ ibm.ibm_zhmc, community.general, community.crypto, ansible.posix, community.libvirt ]
controller: [ openssh, expect, sshuttle ]
kvm: [ libguestfs, libvirt-client, libvirt-daemon-config-network, libvirt-daemon-kvm, cockpit-machines, libvirt-devel, virt-top, qemu-kvm, python3-lxml, cockpit, lvm2 ]
bastion: [ haproxy, httpd, bind, bind-utils, expect, firewalld, mod_ssl, python3-policycoreutils, rsync ]
hypershift: [ make, jq, git, virt-install ]
# Section 12 - OpenShift Settings
install_config:
api_version: v1
compute:
architecture: s390x
hyperthreading: Enabled
control:
architecture: s390x
hyperthreading: Enabled
cluster_network:
cidr: 10.128.0.0/14
host_prefix: 23
type: OVNKubernetes
service_network: 172.30.0.0/16
fips: 'false'
# Section 13 - (Optional) Proxy
# proxy:
# http:
# https:
# no:
# Section 14 - (Optional) Misc
language: en_US.UTF-8
timezone: America/New_York
keyboard: us
root_access: false
ansible_key_name: ansible-ocpz
ocp_ssh_key_comment: OpenShift key
bridge_name: default
network_mode: NAT
#jumphost if network mode is NAT
jumphost:
name: <redacted>
ip: <redacted>
user: root
pass: <redacted>
path_to_keypair: /root/.ssh/id_rsa.pub
# Section 15 - OCP and RHCOS (CoreOS)
# ocp_download_url with '/' at the end !
ocp_download_url: "https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.13.1/s390x/"
# ocp client and installer filenames
ocp_client_tgz: "openshift-client-linux.tar.gz"
ocp_install_tgz: "openshift-install-linux.tar.gz"
# rhcos_download_url with '/' at the end !
rhcos_download_url: "https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.12/4.12.3/"
# For rhcos_os_variant use the OS string as defined in 'osinfo-query os -f short-id'
rhcos_os_variant: rhel8.6
# RHCOS live image filenames
rhcos_live_kernel: "rhcos-4.12.3-s390x-live-kernel-s390x"
rhcos_live_initrd: "rhcos-4.12.3-s390x-live-initramfs.s390x.img"
rhcos_live_rootfs: "rhcos-4.12.3-s390x-live-rootfs.s390x.img"
# Section 16 - Hypershift ( Optional )
hypershift:
kvm_host:
kvm_host_user:
bastion_hypershift:
bastion_hypershift_user:
create_bastion: true
networking_device: enc1100
gateway:
bastion_parms:
interface:
hostname:
base_domain:
os_variant:
nameserver:
gateway:
subnet_mask:
mgmt_cluster_nameserver:
oc_url:
#Hosted Control Plane Parameters
hcp:
clusters_namespace:
hosted_cluster_name:
basedomain:
pull_secret_file: /root/ansible_workdir/auth_file
ocp_release:
machine_cidr: 192.168.122.0/24
arch:
# Make sure to enclose pull_secret in 'single quotes'
pull_secret:
# MultiClusterEngine Parameters
mce:
version:
instance_name: engine
delete: false
# AgentServiceConfig Parameters
asc:
url_for_ocp_release_file:
db_volume_size: "10Gi"
fs_volume_size: "10Gi"
ocp_version:
iso_url:
root_fs_url:
mce_namespace: multicluster-engine # This is the Recommended Namespace for Multicluster Engine operator
agents_parms:
static_ip_parms:
static_ip: true
ip: # Required only if static_ip is true
#-
#-
interface: eth0
agents_count:
# If you want to use specific mac addresses, provide them here
agent_mac_addr:
#-
disk_size: 100G
ram: 16384
vcpus: 4
nameserver:
# Section 17 - (Optional) Create additional compute node in a day-2 operation
day2_compute_node:
vm_name:
vm_hostname:
vm_ip:
hostname:
host_arch:
path_to_key_pair: <redacted>/.ssh/ansible-ocpz.pub
TASK [wait_for_install_complete : Almost there! Add host info to /etc/hosts so you can login to the cluster via web browser. Ansible Controller sudo password required] ***********************
task path: /Users/mohammedzeeshanahmed/personal_bench/ibm/ansible-ocp-provisioner/IBM-Ansible-OpenShift-Provisioning/roles/wait_for_install_complete/tasks/main.yaml:2
Read vars_file '{{ inventory_dir }}/group_vars/all.yaml'
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: mohammedzeeshanahmed
<127.0.0.1> EXEC /bin/sh -c 'echo ~mohammedzeeshanahmed && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/mohammedzeeshanahmed/.ansible/tmp `"&& mkdir "` echo /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192 `" && echo ansible-tmp-1694586065.7199008-6133-260583548742192="` echo /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192 `" ) && sleep 0'
Including module_utils file ansible/__init__.py
Including module_utils file ansible/module_utils/__init__.py
Including module_utils file ansible/module_utils/_text.py
Including module_utils file ansible/module_utils/basic.py
Including module_utils file ansible/module_utils/common/_json_compat.py
Including module_utils file ansible/module_utils/common/__init__.py
Including module_utils file ansible/module_utils/common/_utils.py
Including module_utils file ansible/module_utils/common/arg_spec.py
Including module_utils file ansible/module_utils/common/file.py
Including module_utils file ansible/module_utils/common/locale.py
Including module_utils file ansible/module_utils/common/parameters.py
Including module_utils file ansible/module_utils/common/collections.py
Including module_utils file ansible/module_utils/common/process.py
Including module_utils file ansible/module_utils/common/sys_info.py
Including module_utils file ansible/module_utils/common/text/converters.py
Including module_utils file ansible/module_utils/common/text/__init__.py
Including module_utils file ansible/module_utils/common/text/formatters.py
Including module_utils file ansible/module_utils/common/validation.py
Including module_utils file ansible/module_utils/common/warnings.py
Including module_utils file ansible/module_utils/compat/selectors.py
Including module_utils file ansible/module_utils/compat/__init__.py
Including module_utils file ansible/module_utils/compat/_selectors2.py
Including module_utils file ansible/module_utils/compat/selinux.py
Including module_utils file ansible/module_utils/distro/__init__.py
Including module_utils file ansible/module_utils/distro/_distro.py
Including module_utils file ansible/module_utils/errors.py
Including module_utils file ansible/module_utils/parsing/convert_bool.py
Including module_utils file ansible/module_utils/parsing/__init__.py
Including module_utils file ansible/module_utils/pycompat24.py
Including module_utils file ansible/module_utils/six/__init__.py
<bastion> Attempting python interpreter discovery
<127.0.0.1> EXEC /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'python3.11'"'"'; command -v '"'"'python3.10'"'"'; command -v '"'"'python3.9'"'"'; command -v '"'"'python3.8'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
<bastion> Python interpreter discovery fallback (unsupported platform for extended discovery: darwin)
Using module file /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ansible/modules/blockinfile.py
<127.0.0.1> PUT /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-local-5177qxhnxjem/tmp5hgeqnms TO /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/ /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'sudo -H -S -p "[sudo via ansible, key=jnbasqfpqraesiphksattiwxoxwzftsa] password:" -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-jnbasqfpqraesiphksattiwxoxwzftsa ; KUBECONFIG=/root/.kube/config /opt/homebrew/bin/python3.11 /Users/mohammedzeeshanahmed/.ansible/tmp/ansible-tmp-1694586065.7199008-6133-260583548742192/AnsiballZ_blockinfile.py'"'"' && sleep 0'
It is currently hung at this point but still to be safe il let it run a little bit longer. While I doubt anything will move forward, I will paste more if it changes. Otherwise, this is it
Hmm, I'm not 100% sure. Maybe because it is 'becoming' the root user, it is having problems finding either your Python or Ansible binary? I guess it should really 'become' the same user, just with elevated privileges, instead of root.
What do you think? Could you check which python3
as the root user and see if it returns the correct path? Or we could add a 'become_user' to the task.
Ansible Controller type: Mac M1
Ansible version:
Error:
What is happening? The final stage of the playbook has a role called
wait_for_install_complete
, has a task that tries to patch the/etc/hosts
file with host information to access the cluster.https://github.com/IBM/Ansible-OpenShift-Provisioning/blob/7e7c49781ef17fa3161becb0558cc005a197a147/roles/wait_for_install_complete/tasks/main.yaml#L2-L14
For reasons unknown as of yet, the playbook pauses here and does not proceed further from this task and needs to be interrupted by me manually.
This could cause issues if it happens in automation such as a Jenkins job or a Tekton pipleine.
We should add a timeout for this op while ignoring any errors and even print this information and ask the user to validate this and add it to his hosts file manually if it's not there.