NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

Cannot access nvcr.io repository #39

Closed BlueCloudDev closed 3 years ago

BlueCloudDev commented 3 years ago

Command: srun -l --container-image "nvcr.io/nvidia/pytorch:20.12-py3" --output=./logs/%x%j$DATETIME.log sh -c "echo ok"

[opc@modest-cobra-bastion examples]$ cat logs/sh267.log 0: pyxis: importing docker image ... 0: slurmstepd: error: pyxis: child 15175 failed with error code: 1 0: slurmstepd: error: pyxis: failed to import docker image 0: slurmstepd: error: pyxis: printing contents of log file ... 0: slurmstepd: error: pyxis: [INFO] Querying registry for permission grant 0: slurmstepd: error: pyxis: [INFO] Authenticating with user: 0: slurmstepd: error: pyxis: [INFO] Authentication succeeded 0: slurmstepd: error: pyxis: [INFO] Fetching image manifest list 0: slurmstepd: error: pyxis: [INFO] Fetching image manifest 0: slurmstepd: error: pyxis: [ERROR] URL https://registry-1.docker.io/v2/nvcr.io/nvidia/pytorch/manifests/20.12-py3 returned error code: 401 Unauthorized 0: slurmstepd: error: pyxis: couldn't start container 0: slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint 0: slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 0: slurmstepd: error: Failed to invoke spank plugin stack

I have a login for nvcr.io. I can pull the image using docker login. I cannot find any clear documentation on how to set the user for authentication to add my api key.

Assume I know nothing about pyxis/enroot. How do I set up my user account so that I authenticate properly instead of trying to authenticate with ? I need step-by-step instructions that do not assume I have any previous knowledge.

3XX0 commented 3 years ago

See https://github.com/NVIDIA/enroot/blob/master/doc/cmd/import.md#description

BlueCloudDev commented 3 years ago

I had found this page before, but since it lacks context, I wasn't able to get it working correctly.

To save people that find this issue in google a few hours of frustration:

  1. Create a folder in an accessible location on your drive. In that folder, create a file named specifically .credentials and fill out the appropriate values as described in the link above.
  2. Export the location by issuing the command export ENROOT_CONFIG_PATH=<path-to-folder-containing-credentials-file>
  3. Run the srun command.