Closed logeshwaris closed 2 years ago
Hi @logeshwaris from what I was able to see, the error looks like something specific to OKD, instead of the automation to get it deployed. Currently, we are deploying 4.9 but there are newer versions available, let me see if by updating the version the problem goes away.
Let's see how it goes here https://github.com/Kubeinit/kubeinit/pull/643
Hi @ccamacho, I tried using the latest and i am seeing the below error. Am I missing something?
Command: ansible-playbook \ -v --user root \ -e kubeinit_spec=okd-libvirt-1-2-1 \ -i ./kubeinit/inventory \ ./kubeinit/playbook.yml
Logs:
TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] **
task path: /home/slogeshw/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:53
Monday 11 April 2022 11:23:31 +0530 (0:00:00.209) 0:00:16.327 **
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: slogeshw
<127.0.0.1> EXEC /bin/sh -c 'echo ~slogeshw && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "echo /home/slogeshw/.ansible/tmp
"&& mkdir "echo /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760
" && echo ansible-tmp-1649656411.4216487-1386120-60959960057760="echo /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760
" ) && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/ansible/template/init.py", line 1100, in do_template
res = j2_concat(rf)
File "", line 47, in root
File "/usr/local/lib/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error
raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False) File "/usr/local/lib/python3.6/site-packages/ansible/template/init.py", line 1137, in do_template raise AnsibleUndefinedVariable(e) ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host' fatal: [localhost]: FAILED! => { "changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'" }
PLAY RECAP **** localhost : ok=50 changed=7 unreachable=0 failed=1 skipped=23 rescued=0 ignored=0
Inventory.yml:
=============
###
# The cluster's guest machines can be distributed across mutiple hosts. By default they
# will be deployed in the first Hypervisor. These hypervisors are activated and used
# depending on how they are referenced in the kubeinit spec string.
#
# When we are running the setup-playbook, if a hypervisor host has an ssh_hostname attribute
# then a .ssh/config file will be created and an entry mapping the ansible_host to that
# ssh hostname will be created. In the first example we would associate
# the ansible_host of the first hypervisor host "nyctea" with the hostname provided, it
# can be a short or fully qualified name, but it needs to be resolvable on the host we
# are running the kubeinit setup from. The second example uses a host ip address, which
# can be useful in those cases where the host you are using doesn't have a dns name.
#
# .. code-block:: yaml
#
# hypervisor_hosts:
# hypervisor-01:
# ansible_host: nyctea
# ssh_hostname: server1.example.com
# hypervisor-02:
# ansible_host: tyto
# ssh_hostname: 192.168.222.202
hypervisor_hosts:
hypervisor-01:
ansible_host: nyctea
###
# The inventory will have one host identified as the bastion host. By default, this role will
# be assumed by the first hyperviso. The example would set the second hypervisor to be the bastion host.
# The final example would set the bastion host to be a different host that is not
# being used as a hypervisor for the guests VMs for the clusters using this inventory.
#
# .. code-block:: yaml
#
# bastion_host:
# bastion:
# ansible_host: hypervisor-02
#
# .. code-block:: yaml
#
# bastion_host:
# bastion:
# ansible_host: bastion
bastion_host:
bastion:
target: hypervisor-01
###
# The inventory will have one host identified as the ovn-central host. By default, this role
# will be assumed by the first hypervisor. The first example would set the second hypervisor
# to be the ovn-central host.
#
# .. code-block:: yaml
#
# ovn_central_host:
# target: hypervisor-02
ovn_central_host:
ovn-central:
target: hypervisor-01
###
#
# Setup host definition (used only with the setup-playbook.yml)
#
#
# This inventory will have one host identified as the setup host. By default, this will be
# localhost. The first example would set the first hypervisor host to be the setup host.
# The last example would set the setup host to be a different host that is not being used
# as a hypervisor in this inventory.
#
# .. code-block:: yaml
#
# setup_host:
# ansible_host: nyctea
#
# or
#
# .. code-block:: yaml
#
# setup_host:
# ansible_host: 192.168.222.214
setup_host:
kubeinit-setup:
ansible_host: nyctea
~
~
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Lot of things got broken because of podman not being consistent across the components we deploy, this should be fixed by #666 now
Hi @ccamacho,
Any idea why i see the below error is seen, with this i am not able to proceed my testing in the main branch. Is there anything wrong with my inventory.yml file?
Logs: TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] ** task path: /home/slogeshw/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:53 Monday 11 April 2022 11:23:31 +0530 (0:00:00.209) 0:00:16.327 ** <127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: slogeshw <127.0.0.1> EXEC /bin/sh -c 'echo ~slogeshw && sleep 0' <127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "echo /home/slogeshw/.ansible/tmp"&& mkdir "echo /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760" && echo ansible-tmp-1649656411.4216487-1386120-60959960057760="echo /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760" ) && sleep 0' <127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/slogeshw/.ansible/tmp/ansible-tmp-1649656411.4216487-1386120-60959960057760/ > /dev/null 2>&1 && sleep 0' The full traceback is: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/ansible/template/init.py", line 1100, in do_template res = j2_concat(rf) File "", line 47, in root File "/usr/local/lib/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error raise self._undefined_exception(self._undefined_message) jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False) File "/usr/local/lib/python3.6/site-packages/ansible/template/init.py", line 1137, in do_template raise AnsibleUndefinedVariable(e) ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host' fatal: [localhost]: FAILED! => { "changed": false, "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'" }
Inventory.yml:
=============
###
# The cluster's guest machines can be distributed across mutiple hosts. By default they
# will be deployed in the first Hypervisor. These hypervisors are activated and used
# depending on how they are referenced in the kubeinit spec string.
#
# When we are running the setup-playbook, if a hypervisor host has an ssh_hostname attribute
# then a .ssh/config file will be created and an entry mapping the ansible_host to that
# ssh hostname will be created. In the first example we would associate
# the ansible_host of the first hypervisor host "nyctea" with the hostname provided, it
# can be a short or fully qualified name, but it needs to be resolvable on the host we
# are running the kubeinit setup from. The second example uses a host ip address, which
# can be useful in those cases where the host you are using doesn't have a dns name.
#
# .. code-block:: yaml
#
# hypervisor_hosts:
# hypervisor-01:
# ansible_host: nyctea
# ssh_hostname: server1.example.com
# hypervisor-02:
# ansible_host: tyto
# ssh_hostname: 192.168.222.202
hypervisor_hosts:
hypervisor-01:
ansible_host: nyctea
###
# The inventory will have one host identified as the bastion host. By default, this role will
# be assumed by the first hyperviso. The example would set the second hypervisor to be the bastion host.
# The final example would set the bastion host to be a different host that is not
# being used as a hypervisor for the guests VMs for the clusters using this inventory.
#
# .. code-block:: yaml
#
# bastion_host:
# bastion:
# ansible_host: hypervisor-02
#
# .. code-block:: yaml
#
# bastion_host:
# bastion:
# ansible_host: bastion
bastion_host:
bastion:
target: hypervisor-01
###
# The inventory will have one host identified as the ovn-central host. By default, this role
# will be assumed by the first hypervisor. The first example would set the second hypervisor
# to be the ovn-central host.
#
# .. code-block:: yaml
#
# ovn_central_host:
# target: hypervisor-02
ovn_central_host:
ovn-central:
target: hypervisor-01
###
#
# Setup host definition (used only with the setup-playbook.yml)
#
#
# This inventory will have one host identified as the setup host. By default, this will be
# localhost. The first example would set the first hypervisor host to be the setup host.
# The last example would set the setup host to be a different host that is not being used
# as a hypervisor in this inventory.
#
# .. code-block:: yaml
#
# setup_host:
# ansible_host: nyctea
#
# or
#
# .. code-block:: yaml
#
# setup_host:
# ansible_host: 192.168.222.214
setup_host:
kubeinit-setup:
ansible_host: nyctea
~
~
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Describe the bug Trying to deploy OKD cluster with 1 master and 2 worker nodes. While running the ansible playbook, I see that controller nodes doesn't come to Ready state even after 60tries. When i logged into the bootstrap node, i see the error "Container already in use". If I remove the container, it comes up fine. It doesn't throw this error every time. Atleast seen this issue 2 times out of 5.
To Reproduce Steps to reproduce the behavior:
Clone kubeinit
Run the command
ansible-playbook \ -v --user root \ -e kubeinit_spec=okd-libvirt-1-2-1 \ -i ./kubeinit/inventory \ ./kubeinit/playbook.yml
See error BootStrap logs:
Apr 06 10:59:08 bootstrap podman[8121]: 2022-04-06 10:59:08.285934624 +0000 UTC m=+1.377781571 container cleanup 6581fb6d4ff11c4d91635217c4a27d80453> Apr 06 10:59:18 bootstrap podman[8219]: 2022-04-06 10:59:07.13491223 +0000 UTC m=+0.082398628 image pull quay.io/openshift/okd-content@sha256:be5eb> Apr 06 10:59:19 bootstrap podman[8219]: 2022-04-06 10:59:19.016405592 +0000 UTC m=+11.963891960 container create 92aa4efe11dc0d1e4e99c182b209b9dc6b4> Apr 06 10:59:19 bootstrap podman[8219]: 2022-04-06 10:59:19.641219899 +0000 UTC m=+12.588706277 container init 92aa4efe11dc0d1e4e99c182b209b9dc6b468> Apr 06 10:59:19 bootstrap podman[8219]: 2022-04-06 10:59:19.674477098 +0000 UTC m=+12.621963466 container start 92aa4efe11dc0d1e4e99c182b209b9dc6b46> Apr 06 10:59:19 bootstrap podman[8219]: 2022-04-06 10:59:19.674690723 +0000 UTC m=+12.622177121 container attach 92aa4efe11dc0d1e4e99c182b209b9dc6b4> Apr 06 10:59:20 bootstrap systemd[1]: Stopping Bootstrap a Kubernetes cluster... Apr 06 10:59:20 bootstrap bootkube.sh[9514]: open pidfd: No such process Apr 06 10:59:20 bootstrap bootkube.sh[8219]: time="2022-04-06T10:59:20Z" level=error msg="Error forwarding signal 15 to container 92aa4efe11dc0d1e4e> Apr 06 10:59:20 bootstrap bootkube.sh[2056]: Terminated Apr 06 10:59:20 bootstrap podman[9521]: 2022-04-06 10:59:20.28349949 +0000 UTC m=+0.040186130 container died 92aa4efe11dc0d1e4e99c182b209b9dc6b46848> Apr 06 10:59:20 bootstrap systemd[1]: bootkube.service: Deactivated successfully. Apr 06 10:59:20 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster. Apr 06 10:59:20 bootstrap systemd[1]: bootkube.service: Consumed 33.650s CPU time. Apr 06 10:59:20 bootstrap systemd[1]: release-image.service: Deactivated successfully. Apr 06 10:59:20 bootstrap systemd[1]: Stopped Download the OpenShift Release Image. Apr 06 10:59:20 bootstrap systemd[1]: release-image.service: Consumed 12.351s CPU time. -- Boot c93e0d5bc8b44038b0d5d265ed467c93 -- Apr 06 10:59:31 bootstrap systemd[1]: Starting Download the OpenShift Release Image... Apr 06 10:59:31 bootstrap release-image-download.sh[966]: Pulling service.okdcluster.kubeinit.local:5000/okd@sha256:7d8356245fc3a75fe11d1832ce9fef17> Apr 06 10:59:32 bootstrap podman[1015]: 2022-04-06 10:59:32.079196063 +0000 UTC m=+0.961207467 system refresh Apr 06 10:59:32 bootstrap release-image-download.sh[1015]: 5c93a0adf473e01f1bd88d3e539dbbe6de5bcfb74eace85038a63490f9603143 Apr 06 10:59:32 bootstrap podman[1015]: 2022-04-06 10:59:32.080829538 +0000 UTC m=+0.962840932 image pull service.okdcluster.kubeinit.local:5000/ok> Apr 06 10:59:33 bootstrap systemd[1]: Finished Download the OpenShift Release Image. Apr 06 10:59:41 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster. . . . . . . Apr 06 11:35:46 bootstrap podman[308085]: 2022-04-06 11:35:46.277425197 +0000 UTC m=+0.499459314 container remove d5440565f1b94e5a176c11750c60d4d45861976990b4f5f1aa56bdace09eb412 (image=service.okdcluster.kubeinit.local:5000/okd@sha256:7d8356245fc3a75fe11d1832ce9fef17f3dd0f2ea6f38271319c95918416b9d9, name=quizzical_ellis, io.openshift.release=4.9.0-0.okd-2021-11-28-035710, io.openshift.release.base-image-digest=sha256:24a6759ce7d34123ae68ee14ee2a7c52ec3b2c7a5ae65cf87651176661e55e58) Apr 06 11:35:46 bootstrap bootkube.sh[306030]: Rendering Kubernetes API server core manifests... Apr 06 11:35:46 bootstrap bootkube.sh[308213]: Error: error creating container storage: the container name "kube-apiserver-render" is already in use by "92aa4efe11dc0d1e4e99c182b209b9dc6b468483438865d8a2bcef825b22c65b". You have to remove that container to be able to reuse that name.: that name is already in use Apr 06 11:35:46 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a Apr 06 11:35:46 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'. Apr 06 11:35:46 bootstrap systemd[1]: bootkube.service: Consumed 4.452s CPU time.
[core@bootstrap ~]$ sudo podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ed027262a5fa service.okdcluster.kubeinit.local:5000/okd@sha256:7d8356245fc3a75fe11d1832ce9fef17f3dd0f2ea6f38271319c95918416b9d9 render --output-d... 38 minutes ago Exited (0) 38 minutes ago cvo-render 95b964e69d58 quay.io/openshift/okd-content@sha256:8c24b5ca67f5cd7763dbcb1586cfcfcff2083eae137acfea6f9b0468fcd2e8e6 /usr/bin/cluster-... 37 minutes ago Exited (0) 37 minutes ago etcd-render 6581fb6d4ff1 quay.io/openshift/okd-content@sha256:5a262a1ca5b05a174286494220a1f583ed1fcb2fb60114aae25f6d2670699746 /usr/bin/cluster-... 37 minutes ago Exited (0) 37 minutes ago config-render 92aa4efe11dc quay.io/openshift/okd-content@sha256:be5eb9ef4a8c26ce7e5827285a4e65620aa7b31c9fb203e046c900a45b095764 /usr/bin/cluster-... 36 minutes ago Created kube-apiserver-render [core@bootstrap ~]$
Expected behavior Running OKD Cluster with 1 master and 2 worker nodes.
Infrastructure Hypervisors OS: CentOS-Stream 8 CPUs : 32 Cores Memory: 128 GB HDD: 1TB
Deployment command
ansible-playbook \ -v --user root \ -e kubeinit_spec=okd-libvirt-1-2-1 \ -i ./kubeinit/inventory \ ./kubeinit/playbook.yml
Inventory file diff diff --git a/kubeinit/inventory b/kubeinit/inventory index bbb380d..d862b0e 100644 --- a/kubeinit/inventory +++ b/kubeinit/inventory @@ -72,8 +72,8 @@ kubeinit_inventory_network_name=kimgtnet0
[hypervisor_hosts]
hypervisor-01 ansible_host=nyctea -hypervisor-02 ansible_host=tyto -# hypervisor-01 ansible_host=nyctea ssh_hostname=server1.example.com +#hypervisor-02 ansible_host=tyto +# hypervisor-01 ansible_host=nyctea . . . [controller_nodes:vars] os={'cdk': 'ubuntu', 'eks': 'centos', 'k8s': 'centos', 'kid': 'debian', 'okd': 'coreos', 'rke': 'ubuntu'} -disk=25G +disk=150G ram=25165824 vcpus=8 maxvcpus=16 @@ -152,8 +152,8 @@ target_order=hypervisor-01
[compute_nodes:vars] os={'cdk': 'ubuntu', 'eks': 'centos', 'k8s': 'centos', 'kid': 'debian', 'okd': 'coreos', 'rke': 'ubuntu'} -disk=30G -ram=8388608 +disk=100G +ram=16777216 vcpus=8 maxvcpus=16 type=virtual