NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

slurmstepd: error: pyxis: seccomp filter failed: Function not implemented #42

Closed nikhleshs-hpc closed 3 years ago

nikhleshs-hpc commented 3 years ago

$ SLURM_DEBUG=2 srun --container-image=/home/user/ubuntu.sqsh --container-name=ubuntu3 grep PRETTY_NAME /etc/os-release

srun: debug: launch returned msg_rc=0 err=0 type=8001 srun: Node cn03, 1 tasks started slurmstepd: error: pyxis: seccomp filter failed: Function not implemented slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1 slurmstepd: error: Failed to invoke spank plugin stack srun: Received task exit notification for 1 task of step 224834.3 (status=0x0100). srun: error: cn03: task 0: Exited with exit code 1

grep SECCOMP /lib/modules/$(uname -r)/build/.config

CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y CONFIG_SECCOMP=y

I'm getting issue this only on a node with stateless os image, however node with statefull osimage works fine.

flx42 commented 3 years ago

Hi, thanks for the bug report. Which distro are you using and with which kernel version? Thanks!

Could you also test if you see the same issue by running a container through enroot?

flx42 commented 3 years ago

Please run the enroot requirements self-test too: https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#requirements

nikhleshs-hpc commented 3 years ago

Thanks for reponding,

here's the ouput that u asked

cat /etc/os-release

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

uname -r

3.10.0-693.el7.x86_64

everything is fine with enroot, can run images by allocating nodes with salloc. Hope I answered correctly.

nikhleshs-hpc commented 3 years ago

Kernel version:

Linux version 3.10.0-693.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017

Kernel configuration:

CONFIG_NAMESPACES : OK CONFIG_USER_NS : OK CONFIG_SECCOMP_FILTER : OK CONFIG_OVERLAY_FS : OK (module) CONFIG_X86_VSYSCALL_EMULATION : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13)

Kernel command line:

namespace.unpriv_enable=1 : OK user_namespace.enable=1 : OK vsyscall=native : KO (required if glibc <= 2.13) vsyscall=emulate : OK

Kernel parameters:

user.max_user_namespaces : OK user.max_mnt_namespaces : OK

Extra packages:

nvidia-container-cli : KO (required for GPU support) pv : KO (optional)

flx42 commented 3 years ago

Thanks for the answers, I have a few follow-up questions.

everything is fine with enroot, can run images by allocating nodes with salloc. Hope I answered correctly.

Sorry, my bad, I forgot that this test must be done with enroot start --root to use the same code path as pyxis. So please try the following:

$ salloc -N1
$ srun --pty bash

$ enroot create /home/user/ubuntu.sqsh
$ enroot start --root ubuntu

Regarding the distro/kernel. You mentioned that some nodes are stateful, and some nodes are stateless. Are they all using CentOS 7 + kernel 3.10? Particularly, are they all the same exact kernel version? I'm not exactly familiar with how one would configure CentOS stateless, so any pointer is welcome.

The type of error you're seeing make me thing your Slurm daemon might already be running inside a seccomp filter, so pyxis could be prevent from using seccomp. From within an interactive job step, please report the following:

$ grep Seccomp /proc/$(pidof slurmd)/status
$ grep Seccomp /proc/self/status

Last but not least, if your system is using SELinux, please try disabling it temporarily with sudo setenforce 0 on the compute node(s), and then check if the result is any different.

nikhleshs-hpc commented 3 years ago

All the enroot steps you provided are successfully running and have created an image "ubuntu" $ enroot list pyxis_224835_ubuntu3 pyxis_224836_ubuntu3 pyxis_224837_ubuntu3 ubuntu

even pyxis based image is also getting genrated as we can see above. by using: $ srun --container-image=/home/user/ubuntu.sqsh --container-name=ubuntu3 grep PRETTY_NAME /etc/os-release

Regarding the environment, I mean to say that, i have two test environments one is state full/ Diskfull (with grub) and another testing environment is stateless/ Diskless (no grub) using xcat netboot image for centos.

$ grep Seccomp /proc/$(pidof slurmd)/status Seccomp: 0 $ grep Seccomp /proc/self/status Seccomp: 0

$ getenforce Disabled

flx42 commented 3 years ago

Thanks, everything seems fine.

@3XX0 tipped me to the possible source of the problem, could you try with a more recent CentOS kernel?

flx42 commented 3 years ago

@nikhleshs-hpc let me know if you were able to test with an updated version of the CentOS kernel. It was a while ago, but I think the support of seccomp in the CentOS kernel was fixed in more recent versions. If that doesn't fix it, I think I can submit a patch for this.

nikhleshs-hpc commented 3 years ago

[nikhlesh@login07 ~]$ SLURM_DEBUG=2 srun -w cn034 --reservation=Nikhlesh-enroot-testing --container-image=/home/nikhlesh/ubuntu.sqsh --container-name=ubuntu1 grep PRETTY /host/os-release srun: Consumable Resources (CR) Node Selection plugin loaded with argument 20 srun: select/cons_tres loaded with argument 20 srun: Cray/Aries node selection plugin loaded srun: Linear node selection plugin loaded with argument 20 srun: debug: switch NONE plugin loaded srun: debug: switch generic plugin loaded srun: debug: switch Cray/Aries plugin loaded. srun: debug: spank: opening plugin stack /etc/slurm/plugstack.conf srun: debug: /etc/slurm/plugstack.conf: 1: include "/etc/slurm/plugstack.conf.d/pyxis.conf" srun: debug: spank: opening plugin stack /etc/slurm/plugstack.conf.d/pyxis.conf srun: debug: spank: /etc/slurm/plugstack.conf.d/pyxis.conf:1: Loaded plugin spank_pyxis.so srun: debug: SPANK: appending plugin option "container-image" srun: debug: SPANK: appending plugin option "container-mounts" srun: debug: SPANK: appending plugin option "container-workdir" srun: debug: SPANK: appending plugin option "container-name" srun: debug: SPANK: appending plugin option "container-save" srun: debug: SPANK: appending plugin option "container-mount-home" srun: debug: SPANK: appending plugin option "no-container-mount-home" srun: debug: SPANK: appending plugin option "container-remap-root" srun: debug: SPANK: appending plugin option "no-container-remap-root" srun: launch Slurm plugin loaded srun: debug: mpi type = none srun: debug: propagating RLIMIT_CPU=18446744073709551615 srun: debug: propagating RLIMIT_FSIZE=18446744073709551615 srun: debug: propagating RLIMIT_DATA=18446744073709551615 srun: debug: propagating RLIMIT_STACK=8388608 srun: debug: propagating RLIMIT_CORE=0 srun: debug: propagating RLIMIT_RSS=18446744073709551615 srun: debug: propagating RLIMIT_NPROC=4096 srun: debug: propagating RLIMIT_NOFILE=500000 srun: debug: propagating RLIMIT_AS=18446744073709551615 srun: debug: propagating SLURM_PRIO_PROCESS=0 srun: debug: propagating UMASK=0002 srun: debug: Entering slurm_allocation_msg_thr_create() srun: debug: _is_port_ok: bind() failed port 60664 sock 4 Address already in use srun: debug: port from net_stream_listen is 60665 srun: debug: Entering _msg_thr_internal srun: debug: _is_port_ok: bind() failed port 60664 sock 7 Address already in use srun: debug: _is_port_ok: bind() failed port 60665 sock 7 Address already in use srun: debug: Munge authentication plugin loaded srun: jobid 374712: nodes(1):`cn034', cpu counts: 1(x1) srun: debug: requesting job 374712, user 6260, nodes 1 including (cn034) srun: debug: cpus 1, tasks 1, name grep, relative 65534 srun: debug: _is_port_ok: bind() failed port 60664 sock 7 Address already in use srun: debug: _is_port_ok: bind() failed port 60665 sock 7 Address already in use srun: CpuBindType=(null type) srun: debug: Entering slurm_step_launch srun: debug: mpi type = (null) srun: debug: Using mpi/none srun: debug: Entering _msg_thr_create() srun: debug: _is_port_ok: bind() failed port 60664 sock 11 Address already in use srun: debug: _is_port_ok: bind() failed port 60665 sock 11 Address already in use srun: debug: _is_port_ok: bind() failed port 60666 sock 11 Address already in use srun: debug: _is_port_ok: bind() failed port 60664 sock 14 Address already in use srun: debug: _is_port_ok: bind() failed port 60665 sock 14 Address already in use srun: debug: _is_port_ok: bind() failed port 60666 sock 14 Address already in use srun: debug: _is_port_ok: bind() failed port 60667 sock 14 Address already in use srun: debug: initialized stdio listening socket, port 60668 srun: debug: Started IO server thread (140595708045056) srun: debug: Entering _launch_tasks srun: launching 374712.0 on host cn034, 1 tasks: 0 srun: route default plugin loaded srun: debug: launch returned msg_rc=0 err=0 type=8001 slurmstepd: error: pyxis: child 21409 failed with error code: 1 slurmstepd: error: pyxis: couldn't get list of existing container filesystems slurmstepd: error: pyxis: printing contents of log file ... slurmstepd: error: pyxis: mkdir: cannot create directory '/run/enroot': Permission denied slurmstepd: error: pyxis: couldn't get list of containers slurmstepd: error: spank: required plugin spank_pyxis.so: task_init_privileged() failed with rc=-1 slurmstepd: error: spank_task_init_privileged failed srun: Node cn034, 1 tasks started srun: Received task exit notification for 1 task of step 374712.0 (status=0x0100). srun: error: cn034: task 0: Exited with exit code 1 srun: debug: task 0 done srun: debug: IO thread exiting srun: debug: Leaving _msg_thr_internal

nikhleshs-hpc commented 3 years ago

Sorry, for the delayed reply, was working on your last suggestion. Took a while to update the kernel, did it on another node.

[nikhlesh@login07 ~]$ salloc -w cn034 --reservation=Nikhlesh-enroot-testing salloc: Granted job allocation 374721 [nikhlesh@login07 ~]$ ssh cn034 Warning: your password will expire in 9 days [nikhlesh@cn034 ~]$ ./enroot-check_3.3.0_x86_64.run --verify Kernel version:

Linux version 3.10.0-957.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Nov 8 23:39:32 UTC 2018

Kernel configuration:

CONFIG_NAMESPACES : OK CONFIG_USER_NS : OK CONFIG_SECCOMP_FILTER : OK CONFIG_OVERLAY_FS : OK (module) CONFIG_X86_VSYSCALL_EMULATION : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_EMULATE : KO (required if glibc <= 2.13) CONFIG_VSYSCALL_NATIVE : KO (required if glibc <= 2.13)

Kernel command line:

namespace.unpriv_enable=1 : OK user_namespace.enable=1 : OK vsyscall=native : KO (required if glibc <= 2.13) vsyscall=emulate : OK

Kernel parameters:

user.max_user_namespaces : OK user.max_mnt_namespaces : OK

Extra packages:

nvidia-container-cli : KO (required for GPU support) [nikhlesh@cn034 ~]$ ./enroot-check_3.3.0_x86_64.run Extracting [####################] 100% Bundle ran successfully!

flx42 commented 3 years ago

The error I see is different, it's now:

slurmstepd: error: pyxis: mkdir: cannot create directory '/run/enroot': Permission denied
slurmstepd: error: pyxis: couldn't get list of containers

So you might need to tweak your enroot.conf, I suppose using enroot directly will also fail at this point, please verify.

We have some documentation on the wiki regarding the enroot conf: https://github.com/NVIDIA/pyxis/wiki/Setup#enroot-configuration-example https://github.com/NVIDIA/pyxis/wiki/Setup#slurm-prolog

nikhleshs-hpc commented 3 years ago

Hi felix,

slurmstepd: error: pyxis: mkdir: cannot create directory '/run/enroot': Permission denied slurmstepd: error: pyxis: couldn't get list of containers

got resolved using : https://github.com/NVIDIA/enroot/issues/13#issuecomment-530937906

Pyxis working fine with Linux version 3.10.0-957.el7.x86_64, Though getting one error related to task_prolog.

[nikhlesh@login07 ~]$ srun -w cn329 --reservation=Nikhlesh-enroot-testing grep PRETTY /etc/os-release PRETTY_NAME="CentOS Linux 7 (Core)"

[nikhlesh@login07 ~]$ srun -w cn329 --reservation=Nikhlesh-enroot-testing --container-image=/home/nikhlesh/ubuntu.sqsh grep PRETTY /etc/os-release slurmstepd: pyxis: creating container filesystem ... slurmstepd: pyxis: starting container ... slurmstepd: error: Could not run slurm task_prolog [/var/share/slurm/slurm.taskprolog]: No such file or directory PRETTY_NAME="Ubuntu 20.04.2 LTS"

getting this error only when pyxis provided option is used.

flx42 commented 3 years ago

I think you are specifying TaskProlog in your slurm.conf. And this won't work with pyxis since the path to the task prolog is not available inside the container, because we switched the root filesystem.

If you can't remove the TaskProlog, you could test adding --container-mounts /var/share/slurm/slurm.taskprolog . And if it works, you can always mount this file by adding a file in /etc/enroot/mounts.d. But be warned that your task prolog script might also get confused when running inside the container: it will not have access to the same filesystem as bare-metal, and it will run as (remapped) UID 0.

nikhleshs-hpc commented 3 years ago

Great, Thanks alot @flx42 for the help. will implement the above solution. :-)

flx42 commented 3 years ago

I've been told that this problem is now fixed, so closing this issue. Feel free to reopen (or file a new bug) if needed. Thanks!