Closed aboseria closed 1 year ago
Make sure that /proc/sys/user/max_user_namespaces
is set appropriately. See https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md
It was already set to 32 prior to this happening
Can you try with a value higher than 32 to verify that it's the cause of the problem?
Yup it's set to a higher amount but no luck. Is there a Unix command to clean up namespaces and kill defunct processes?
You can try to use lsns
and see what it reports.
On Wed, Oct 26, 2022, 09:45 aboseria @.***> wrote:
Yup it's set to a higher amount but no luck. Is there a Unix command to clean up namespaces and kill defunct processes?
— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/pyxis/issues/93#issuecomment-1292323313, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA32BDNCYIACR5MF6A3XKKLWFFN3DANCNFSM6AAAAAARNFPXCA . You are receiving this because you commented.Message ID: @.***>
This is the output
NS TYPE NPROCS PID USER COMMAND
4026531835 cgroup 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531836 pid 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531837 user 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531838 uts 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531839 ipc 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531840 mnt 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
4026531992 net 6 4141500 maboseri /bin/bash /cm/shared/apps/jupyter/12.2.0/bin/jupyterhub-singleuser-gw --port=44839 --SingleUserNotebookApp.default_url=/l
Any recommendations based on the output?
Not really, try a very very high value for max_user_namespaces? I'm not sure what's the logic on Ubuntu 22.10, but I have this:
$ cat /proc/sys/user/max_user_namespaces
126604
And on another machine:
$ cat /proc/sys/user/max_user_namespaces
8254821
Still no luck :(
It's weird, what's your distro and kernel version?
I would also recommend testing with just enroot and outside of a Slurm job to try to simplify the situation a little bit (no pyxis, no cgroup from Slurm).
Keep getting an error that there is no space left on the device
Any recommendations on how to resolve?