Closed kjw3 closed 2 years ago
Rerunning shows below
TASK [Validating the CUDA with GPU] ************************************************************************************
fatal: [172.16.100.10]: FAILED! => {"changed": true, "cmd": "timeout 60 kubectl run cuda-vector-add --rm -t -i --restart=Never --image=k8s.gcr.io/cuda-vector-add:v0.1", "delta": "0:00:00.045969", "end": "2021-10-15 21:32:45.031626", "msg": "non-zero return code", "rc": 1, "start": "2021-10-15 21:32:44.985657", "stderr": "Error from server (AlreadyExists): pods \"cuda-vector-add\" already exists", "stderr_lines": ["Error from server (AlreadyExists): pods \"cuda-vector-add\" already exists"], "stdout": "", "stdout_lines": []}
...ignoring
@kjw3
couldn't replicate this issue from our end. please let us know if you still see the issue.
Thanks Anurag G
Host OS: Ubuntu 20.04 LTS EGX-Platform 4.1 (installed via playbook)
When running setup.sh validate, I'm seeing the Validating the CUDA with GPU task hanging.
Just hangs here. If I cancel an rerun, all the validation works, but this task fails saying it is already running.
Looking at the logs of the cuda-vector-add pod, things look good.