Closed rstober closed 11 months ago
Hi, I'm trying to run a container that lives in a local docker registry. I'm trying to run it using srun, but it gives me a 400 error:
[robert@cnode001 ~]$ srun --container-image='docker://master:5000#custom-pytorch-3-1:latest' --pty --gres=gpu:t4:1 /bin/bash pyxis: importing docker image: docker://master:5000#custom-pytorch-3-1:latest slurmstepd: error: pyxis: child 30165 failed with error code: 1 slurmstepd: error: pyxis: failed to import docker image slurmstepd: error: pyxis: printing enroot log file: slurmstepd: error: pyxis: [INFO] Querying registry for permission grant slurmstepd: error: pyxis: [ERROR] URL http://master:5000/v2/custom-pytorch-3-1/manifests/latest returned error code: 400 Bad Request slurmstepd: error: pyxis: couldn't start container slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 slurmstepd: error: Failed to invoke spank plugin stack srun: error: cnode001: task 0: Exited with exit code 1
Following some other threads about this, I found advice saying to try to pull the image just using enroot. This gives me the same 400 error:
[root@cnode001 ~]# enroot import --output rms-enroot-test.sqsh 'docker://master:5000#custom-pytorch-3-1:latest' [INFO] Querying registry for permission grant [ERROR] URL http://master:5000/v2/custom-pytorch-3-1/manifests/latest returned error code: 400 Bad Request
The image does exist in the local Docker registry. I can pull or run it just fine from there using Docker:
[robert@cnode001 ~]$ docker pull master:5000/custom-pytorch-3-1:latest latest: Pulling from custom-pytorch-3-1 7608715873ec: Pull complete 7c8937d0a90f: Pull complete c5b9a46f3cd0: Pull complete . . .
And I've verified that the image is actually in the local Docker registry using Curl:
[root@cnode001 ~]# curl -k -X GET https://master:5000/v2/_catalog {"repositories":["custom-pytorch-3-1","nvaie/pytorch-3-1","nvidia/pytorch"]} [root@cnode001 ~]# curl -k -X GET https://master:5000/v2/custom-pytorch-3-1/tags/list {"name":"custom-pytorch-3-1","tags":["latest"]}
What am I doing wrong?
I was told by @rstober that this is fixed now.
Hi, I'm trying to run a container that lives in a local docker registry. I'm trying to run it using srun, but it gives me a 400 error:
Following some other threads about this, I found advice saying to try to pull the image just using enroot. This gives me the same 400 error:
The image does exist in the local Docker registry. I can pull or run it just fine from there using Docker:
And I've verified that the image is actually in the local Docker registry using Curl:
What am I doing wrong?