NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

504 gateway error #21

Closed x3gluk closed 4 years ago

x3gluk commented 4 years ago

Hi,

I can't run container from our local registry. if i run "srun --gres=gpu:1 docker run registry.local:5050/bot/gpu-pack:latest" - everything is ok, but if i run "srun --gres=gpu:1 --container-image=registry.local:5050/bot/gpu-pack:latest#chatbot/gpu-pack:latest" I have: slurmstepd-gpu: pyxis: importing docker image ... slurmstepd-gpu: error: pyxis: child 465067 failed with error code: 1 slurmstepd-gpu: error: pyxis: failed to import docker image slurmstepd-gpu: error: pyxis: printing contents of log file ... slurmstepd-gpu: error: pyxis: [INFO] Querying registry for permission grant slurmstepd-gpu: error: pyxis: [ERROR] URL https://registry.local:5050/v2/bot/gpu-pack/manifests/latest returned error code: 504 Gateway Time-out slurmstepd-gpu: pyxis: could not remove squashfs: No such file or directory slurmstepd-gpu: error: spank: required plugin spank_pyxis.so: task_init_privileged() failed with rc=-1 slurmstepd-gpu: error: spank_task_init_privileged failed slurmstepd-gpu: error: write to unblock task 0 failed: Broken pipe

Can you help me?