Closed javrtg closed 10 months ago
we believe this error may be caused by the container becoming associated with the first user who uses it, preventing subsequent users from using the same container. Please, can you confirm this behavior?
It should work, that's why you share the ENROOT_CACHE_PATH
with other members of your group.
Perhaps it's due to special permissions settings on the folder or files, perhaps a custom umask setting.
What are the permissions on the layer files in this directory? It should be 640
to enable sharing layers with other users.
Thank you for the prompt response!
Yes, that is Indeed the issue. Our permissions are 600
. We will see if we can change them to 640
:)
Thanks again!
Hi,
We're encountering
Permission denied
errors when attempting to import Docker images with Pyxis through SLURM job submissions. Specifically, this happens when using containers from the NVIDIA catalog.[This is an example of error logs]
```shell pyxis: importing docker image ... slurmstepd: error: pyxis: child 1976120 failed with error code: 5 slurmstepd: error: pyxis: failed to import docker image slurmstepd: error: pyxis: printing contents of log file ... slurmstepd: error: pyxis: [INFO] Querying registry for permission grant slurmstepd: error: pyxis: [INFO] Authenticating with user:According to some tests we've done and based on the error logs, in particular, lines like the one below:
we believe this error may be caused by the container becoming associated with the first user who uses it, preventing subsequent users from using the same container. Please, can you confirm this behavior?
For context, the above error arises when executing the
sbatch
slurm command using the batch script below. This specific script asks for the NVIDIA container cuda:12.2.0-devel-ubuntu20.04 :A workaround we have found consists on deleting the cache inside the folder
/raid/enroot-cache/group-18000
that is mentioned in the error logs. This way, different users seem to be able to use the same NVIDIA container. However, we're unsure if this method is the reccomended solution. Please, could you provide guidance or an alternative solution?Thank you