NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
118 stars 47 forks source link

untar of cri-containerd-cni fails #45

Closed rdarbha closed 3 months ago

rdarbha commented 6 months ago
fatal: [10.225.0.212]: FAILED! => {"msg": "Unable to execute ssh command line on a controller due to: [Errno 13] Permission denied: b'sshpass'"}
fatal: [10.225.0.213]: FAILED! => {"msg": "Unable to execute ssh command line on a controller due to: [Errno 13] Permission denied: b'sshpass'"}
fatal: [10.225.0.211]: FAILED! => {"msg": "Unable to execute ssh command line on a controller due to: [Errno 13] Permission denied: b'sshpass'"}
changed: [10.225.0.210]

The issue is hard to diagnose, but it could be one of two problems:

  1. the --no-overwrite-dir seems to cause an issue even when become is used in the Ansible playbook. Removing this extra_opts section results in a successful playbook run.
  2. using the full Linux command with sudo works, but the Ansible playbook outputs an error as though sudo isn't being used despite become in the task.

fatal: [10.225.0.73]: FAILED! => {"changed": false, "dest": "/", "extract_results": {"cmd": ["/usr/bin/tar", "--extract", "-C", "/", "-z", "--show-transformed-names", "--no-overwrite-dir", "-f", "/tmp/cri-containerd-cni-1.7.3-linux-amd64.tar.gz"], "err": "/usr/bin/tar: cri-containerd.DEPRECATED.txt: Cannot open: File exists\n/usr/bin/tar: etc: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: etc/cni: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: etc/cni/net.d: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: etc/cni/net.d/10-containerd-net.conflist: Cannot open: File exists\n/usr/bin/tar: etc/systemd: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: etc/systemd/system: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: etc/systemd/system/containerd.service: Cannot open: File exists\n/usr/bin/tar: etc/crictl.yaml: Cannot open: File exists\n/usr/bin/tar: usr: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: usr/local: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: usr/local/bin: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: usr/local/bin/crictl: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/containerd-shim-runc-v2: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/containerd-stress: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/containerd-shim-runc-v1: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/containerd-shim: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/critest: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/containerd: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/ctr: Cannot open: File exists\n/usr/bin/tar: usr/local/bin/ctd-decoder: Cannot open: File exists\n/usr/bin/tar: usr/local/sbin: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: usr/local/sbin/runc: Cannot open: File exists\n/usr/bin/tar: opt: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/cni: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/cni/bin: Cannot change mode to rwxrwxr-x: Operation not permitted\n/usr/bin/tar: opt/cni/bin/dhcp: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/macvlan: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/host-device: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/dummy: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/loopback: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/ipvlan: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/host-local: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/vlan: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/firewall: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/ptp: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/portmap: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/vrf: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/bandwidth: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/static: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/tuning: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/bridge: Cannot open: File exists\n/usr/bin/tar: opt/cni/bin/sbr: Cannot open: File exists\n/usr/bin/tar: opt/containerd: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/containerd/cluster: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/containerd/cluster/gce: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/containerd/cluster/gce/env: Cannot open: File exists\n/usr/bin/tar: opt/containerd/cluster/gce/configure.sh: Cannot open: File exists\n/usr/bin/tar: opt/containerd/cluster/gce/cni.template: Cannot open: File exists\n/usr/bin/tar: opt/containerd/cluster/gce/cloud-init: Cannot change mode to rwxr-xr-x: Operation not permitted\n/usr/bin/tar: opt/containerd/cluster/gce/cloud-init/node.yaml: Cannot open: File exists\n/usr/bin/tar: opt/containerd/cluster/gce/cloud-init/master.yaml: Cannot open: File exists\n/usr/bin/tar: opt/containerd/cluster/version: Cannot open: File exists\n/usr/bin/tar: Exiting with failure status due to previous errors\n", "out": "", "rc": 2}, "gid": 0, "group": "root", "handler": "TgzArchive", "mode": "0755", "msg": "failed to unpack /tmp/cri-containerd-cni-1.7.3-linux-amd64.tar.gz to /", "owner": "root", "size": 4096, "src": "/tmp/cri-containerd-cni-1.7.3-linux-amd64.tar.gz", "state": "directory", "uid": 0}

angudadevops commented 6 months ago

@rdarbha is this a fresh system you're triggering the installation ?

I guess you're trying on GCP Instance ?

we didn't test on GCP instance but we have tested on Baremetal system and AWS cloud works fine without any issue

can you share the output of the below commands

sudo pip3 list | grep ansible 

And would like to understand how you've updated the hosts file

rdarbha commented 6 months ago

@angudadevops sorry for late reply. We're doing this using our own servers (baremetal on-prem). Our ansible version is 7.0.0 as per the setup.sh file: python3 -m pip install ansible==7.0.0 2>&1 >/dev/null

I'll get the pip3 ansible output from our k8s-admin where we ran the commands soon.

angudadevops commented 5 months ago

@rdarbha are you still seeing issue, please let us know

angudadevops commented 5 months ago

@rdarbha are you still need help or else we will close the issue if there is no activity.