I am trying to install deepops on RHEL 7.6 and failing at the below task with the error mentioned.
TASK [kubernetes/node : Write kubelet environment config file (kubeadm)] ***
task path: /home/admin/deepops/submodules/kubespray/roles/kubernetes/node/tasks/kubelet.yml:18
<10.2.95.200> ESTABLISH SSH CONNECTION FOR USER: admin
<10.2.95.200> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=5m)(-o)(ConnectionAttempts=100)(-o)(UserKnownHostsFile=/dev/null)
<10.2.95.200> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
<10.2.95.200> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="admin")
<10.2.95.200> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=60)
<10.2.95.200> SSH: Set ssh_common_args: ()
<10.2.95.200> SSH: Set ssh_extra_args: ()
<10.2.95.200> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="~/.ssh/ansible-%r@%h:%p")
<10.2.95.200> SSH: EXEC sshpass -d9 ssh -vvv -o ControlMaster=auto -o ControlPersist=5m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'User="admin"' -o ConnectTimeout=60 -o 'ControlPath="~/.ssh/ansible-%r@%h:%p"' 10.2.95.200 '/bin/sh -c '"'"'( umask 77 && mkdir -p "echo /tmp"&& mkdir "echo /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708" && echo ansible-tmp-1664645489.9844334-25691-50490429029708="echo /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708" ) && sleep 0'"'"''
<10.2.95.200> (0, b'ansible-tmp-1664645489.9844334-25691-50490429029708=/tmp/ansible-tmp-1664645489.9844334-25691-50490429029708\n', b'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 58: Applying options for \r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 21080\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
looking for "kubelet.env.v1beta1.j2" at "/home/admin/deepops/submodules/kubespray/roles/kubernetes/node/templates/kubelet.env.v1beta1.j2"
<10.2.95.200> ESTABLISH SSH CONNECTION FOR USER: admin
<10.2.95.200> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=5m)(-o)(ConnectionAttempts=100)(-o)(UserKnownHostsFile=/dev/null)
<10.2.95.200> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
<10.2.95.200> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="admin")
<10.2.95.200> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=60)
<10.2.95.200> SSH: Set ssh_common_args: ()
<10.2.95.200> SSH: Set ssh_extra_args: ()
<10.2.95.200> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="~/.ssh/ansible-%r@%h:%p")
<10.2.95.200> SSH: EXEC sshpass -d9 ssh -vvv -o ControlMaster=auto -o ControlPersist=5m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'User="admin"' -o ConnectTimeout=60 -o 'ControlPath="~/.ssh/ansible-%r@%h:%p"' 10.2.95.200 '/bin/sh -c '"'"'rm -f -r /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708/ > /dev/null 2>&1 && sleep 0'"'"''
<10.2.95.200> (0, b'', b'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 58: Applying options for \r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 21080\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
The full traceback is:
Traceback (most recent call last):
File "/opt/deepops/env/lib/python3.6/site-packages/ansible/template/init.py", line 1121, in do_template
res = j2_concat(rf)
File "", line 133, in root
File "/opt/deepops/env/lib/python3.6/site-packages/jinja2/runtime.py", line 747, in _fail_with_undefined_error
raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'kube_node'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/deepops/env/lib/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run
resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
File "/opt/deepops/env/lib/python3.6/site-packages/ansible/template/init.py", line 1160, in do_template
raise AnsibleUndefinedVariable(e)
ansible.errors.AnsibleUndefinedVariable: 'dict object' has no attribute 'kube_node'
fatal: [mgmt01]: FAILED! => changed=false
msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''kube_node'''
I am trying to install deepops on RHEL 7.6 and failing at the below task with the error mentioned.
TASK [kubernetes/node : Write kubelet environment config file (kubeadm)] *** task path: /home/admin/deepops/submodules/kubespray/roles/kubernetes/node/tasks/kubelet.yml:18 <10.2.95.200> ESTABLISH SSH CONNECTION FOR USER: admin <10.2.95.200> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=5m)(-o)(ConnectionAttempts=100)(-o)(UserKnownHostsFile=/dev/null) <10.2.95.200> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no) <10.2.95.200> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="admin") <10.2.95.200> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=60) <10.2.95.200> SSH: Set ssh_common_args: () <10.2.95.200> SSH: Set ssh_extra_args: () <10.2.95.200> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="~/.ssh/ansible-%r@%h:%p") <10.2.95.200> SSH: EXEC sshpass -d9 ssh -vvv -o ControlMaster=auto -o ControlPersist=5m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'User="admin"' -o ConnectTimeout=60 -o 'ControlPath="~/.ssh/ansible-%r@%h:%p"' 10.2.95.200 '/bin/sh -c '"'"'( umask 77 && mkdir -p "
echo /tmp
"&& mkdir "echo /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708
" && echo ansible-tmp-1664645489.9844334-25691-50490429029708="echo /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708
" ) && sleep 0'"'"'' <10.2.95.200> (0, b'ansible-tmp-1664645489.9844334-25691-50490429029708=/tmp/ansible-tmp-1664645489.9844334-25691-50490429029708\n', b'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 58: Applying options for \r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 21080\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n') looking for "kubelet.env.v1beta1.j2" at "/home/admin/deepops/submodules/kubespray/roles/kubernetes/node/templates/kubelet.env.v1beta1.j2" <10.2.95.200> ESTABLISH SSH CONNECTION FOR USER: admin <10.2.95.200> SSH: ansible.cfg set ssh_args: (-o)(ControlMaster=auto)(-o)(ControlPersist=5m)(-o)(ConnectionAttempts=100)(-o)(UserKnownHostsFile=/dev/null) <10.2.95.200> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no) <10.2.95.200> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="admin") <10.2.95.200> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=60) <10.2.95.200> SSH: Set ssh_common_args: () <10.2.95.200> SSH: Set ssh_extra_args: () <10.2.95.200> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="~/.ssh/ansible-%r@%h:%p") <10.2.95.200> SSH: EXEC sshpass -d9 ssh -vvv -o ControlMaster=auto -o ControlPersist=5m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'User="admin"' -o ConnectTimeout=60 -o 'ControlPath="~/.ssh/ansible-%r@%h:%p"' 10.2.95.200 '/bin/sh -c '"'"'rm -f -r /tmp/ansible-tmp-1664645489.9844334-25691-50490429029708/ > /dev/null 2>&1 && sleep 0'"'"'' <10.2.95.200> (0, b'', b'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 58: Applying options for \r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 21080\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n') The full traceback is: Traceback (most recent call last): File "/opt/deepops/env/lib/python3.6/site-packages/ansible/template/init.py", line 1121, in do_template res = j2_concat(rf) File "", line 133, in root File "/opt/deepops/env/lib/python3.6/site-packages/jinja2/runtime.py", line 747, in _fail_with_undefined_error raise self._undefined_exception(self._undefined_message) jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'kube_node'During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/deepops/env/lib/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False) File "/opt/deepops/env/lib/python3.6/site-packages/ansible/template/init.py", line 1160, in do_template raise AnsibleUndefinedVariable(e) ansible.errors.AnsibleUndefinedVariable: 'dict object' has no attribute 'kube_node' fatal: [mgmt01]: FAILED! => changed=false msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''kube_node'''
NO MORE HOSTS LEFT *****
PLAY RECAP ***** mgmt01 : ok=651 changed=119 unreachable=0 failed=1 skipped=854 rescued=0 ignored=2
The inventory file content is as below:
`# Server Inventory File #
Uncomment and change the IP addresses in this file to match your environment
Define per-group or per-host configuration in group_vars/*.yml
ALL NODES
NOTE: Use existing hostnames here, DeepOps will configure server hostnames to match these values
[all] mgmt01 ansible_host=10.2.95.200 gpu02 ansible_host=172.29.100.101
mgmt01 ansible_host=10.0.0.1
mgmt02 ansible_host=10.0.0.2
mgmt03 ansible_host=10.0.0.3
login01 ansible_host=10.0.1.1
gpu01 ansible_host=10.0.2.1
gpu02 ansible_host=10.0.2.2
KUBERNETES
[kube-master] mgmt01
mgmt01
mgmt02
mgmt03
Odd number of nodes required
[etcd] mgmt01
mgmt01
mgmt02
mgmt03
Also add mgmt/master nodes here if they will run non-control plane jobs
[kube-node] gpu02
gpu01
gpu02
[k8s-cluster:children] kube-master kube-node
SLURM
[slurm-master]
login01
[slurm-nfs]
login01
[slurm-node]
gpu01
gpu02
The following groups are used to break out individual cluster services onto
different nodes. By default, we put all these functions on the cluster head
node. To break these out into different nodes, replace the
[group:children] section with [group], and list individual nodes.
[slurm-cache:children] slurm-master
[slurm-nfs-client:children] slurm-node
[slurm-metric:children] slurm-master
[slurm-login:children] slurm-master
Single group for the whole cluster
[slurm-cluster:children] slurm-master slurm-node slurm-cache slurm-nfs slurm-metric slurm-login
SSH connection configuration
[all:vars]
SSH User
ansible_user=ubuntu
ansible_ssh_private_key_file='~/.ssh/id_rsa'
SSH bastion/jumpbox
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -q ubuntu@10.0.0.1"'
`