Closed supertetelman closed 1 year ago
Currently blocked by this issue if someone wants to jump in and debug:
`FAILED - RETRYING: download_container | Download image if required (1 retries left).
fatal: [virtual-gpu01-0 -> virtual-gpu01-0]: FAILED! => changed=true
attempts: 4
cmd:
- /usr/local/bin/crictl
- pull
- quay.io/calico/node:v3.24.5
delta: '0:00:00.039323'
end: '2023-03-29 03:03:57.565413'
msg: non-zero return code
rc: 1
start: '2023-03-29 03:03:57.526090'
stderr: |-
E0329 03:03:57.563464 15688 remote_image.go:222] "PullImage from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" image="quay.io/calico/node:v3.24.5"
time="2023-03-29T03:03:57Z" level=fatal msg="pulling image: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService"
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>`
Finally got this working and tested. Looks like that last small patch fixed the issues with the monitoring stack and metallb stack. I'd like to merge this PR through and then open up a new PR to bump GPU Operator versions and kubespray versions once again to the version that came out this past week.
Basic update to newest K8s and Kubespray versions. Docker is now officially unsupported in K8s and needed to remove the runtime from tests and documentation.
Please merge https://github.com/NVIDIA/deepops/pull/1250 first.