NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

pyxis/enroot not working with containers that use entrypoint #22

Closed sfeltman closed 4 years ago

sfeltman commented 4 years ago

When attempting to use containers with an entrypoint, "sh" is passed to this entrypoint as an argument:

srun --container-image jrottenberg/ffmpeg:3.2-alpine which ffmpeg
slurmstepd: pyxis: importing docker image ...
slurmstepd: pyxis: creating container filesystem ...
slurmstepd: pyxis: starting container ...
slurmstepd: error: pyxis: container start failed with error code: 1
slurmstepd: error: pyxis: printing contents of log file ...
slurmstepd: error: pyxis:     ffmpeg version 3.2.15 Copyright (c) 2000-2020 the FFmpeg developers
slurmstepd: error: pyxis:       built with gcc 6.4.0 (Alpine 6.4.0)
...
slurmstepd: error: pyxis:     Trailing options were found on the commandline.
slurmstepd: error: pyxis:     [NULL @ 0x5556dc8e3ae0] Unable to find a suitable output format for 'sh'
slurmstepd: error: pyxis:     sh: Invalid argument
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init_privileged() failed with rc=-1
slurmstepd: error: spank_task_init_privileged failed
slurmstepd: error: write to unblock task 0 failed: Broken pipe

Notice "Unable to find a suitable output format for 'sh'". This leads me to believe pyxis's attempt to call enroot start with sh -c is passing "sh" to the ffmpeg entrypoint as an argument.

flx42 commented 4 years ago

Indeed. That's a known limitation of the SPANK plugin approach of Slurm. Thanks for raising this, I'm in the process of improving the documentation and will definitely mention this.

When using Slurm, you have to pass arguments to your job submission, you can't just do srun --container-image myimage like when using docker run myimage (you're relying on the CMD from the container image). Slurm is also responsible for executing your command too, and we can't really hijack the srun arguments to prepend the entrypoint.

Some entrypoints will work. Any entrypoint that end with exec "$@" will work fine. That's a requirement similar to exec-wrapper in gdb, in case you're familiar with it. But you can't use an entrypoint that executes something else (or executes nothing). For instance, this requirement is satisfied by the official nginx container image: https://github.com/nginxinc/docker-nginx/blob/793319d7251c03eccecbf27b60e0cfbbd2d1f400/mainline/buster/docker-entrypoint.sh They have an entrypoint, but it's not the nginx binary.

But for our clusters, we decided to disable entrypoints for all containers launches. We do that with a custom enroot configuration, by replacing the container's entrypoint with a no-op script through a bind-mount:

# The bind-mount:
$ cat /etc/enroot/mounts.d/90-entrypoint.fstab 
/etc/enroot/entrypoint /etc/rc.local none x-create=file,bind,ro,nosuid,nodev,noexec,nofail,silent

# The no-op script intercepting the entrypoint.
$ cat /etc/enroot/entrypoint 
# This file will be sourced by /etc/rc.
# We call exec here and do not return control to /etc/rc.
if [ $# -gt 0 ]; then
    exec "$@"
else
    exec '/bin/bash'
fi

With this approach, if an entrypoint is required (e.g. to set environment variables or perform some initialization), you must call it manually:

$ srun --container-image myimage /entrypoint.sh mycmd args1 args2...

For now, we made the choice to disable entrypoints from enroot and not from pyxis, since only some entrypoints cause the issue you mentioned.

Hope that helps.

sfeltman commented 4 years ago

Hi Felix,

I've been able to get your suggestion working. Thanks!

What about adding a new option which supplies a similar script to enroot to override the entrypoint? Something like --container-clear-entrypoint (or make it the default)

Thanks

flx42 commented 4 years ago

What about adding a new option which supplies a similar script to enroot to override the entrypoint?

I have mixed feelings about this :)

It would solve problems like this one, but I also consider this particular container image to be bogus. I don't think any official Docker Hub image is using this pattern (for instance nginx is well-behaved) but I might be wrong. I have clearly made this mistake myself in the past, and amusingly for the exact same app: https://gitlab.com/nvidia/container-images/samples/-/blob/master/cuda/ubuntu16.04/ffmpeg-gpu/Dockerfile#L44 But in retrospect, I think it's just weird, it breaks things like docker run -ti ffmpeg bash (you would need to add --entrypoint=), just to avoid typing a few extra characters.

Also, I don't want to start adding too many knobs to pyxis. Let me think about it a bit, and we'll discuss internally.

flx42 commented 4 years ago

By the way, you don't need to install the entrypoint script system-wide, you can do it manually:

srun --container-mounts ~/entrypoint:/etc/rc.local ...

But this is clearly opaque compared to an hypothetical command-line argument like --no-container-entrypoint.

flx42 commented 4 years ago

Well, that's an interesting rabbit hole, this Dockerfile pattern is actually documented in the "best practices": https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#entrypoint I guess if you see the docker run as a command-line tool, it makes more sense. And users probably like that it avoids the repetition like you need to do without an entrypoint: docker run flx42/ffmpeg ffmpeg bbb.mp4 ...

But, in the guidelines for official Docker Hub images, it is not recommended except if you do FROM scratch: https://github.com/docker-library/official-images#consistency

Ensure that docker run official-image bash (or sh) works too.

flx42 commented 4 years ago

After discussing with my colleagues, we'll disable entrypoints by default, but add a srun command-line argument to enable them. I'll push the code tomorrow, probably.

sfeltman commented 4 years ago

Thanks!

flx42 commented 4 years ago

@sfeltman this is now implemented. Be aware that the current branch has gone through quite a few changes and it now requires Slurm 20.02.

By default, entrypoints are now disabled, so this works:

$ srun --container-image jrottenberg/ffmpeg:3.2-alpine which ffmpeg
pyxis: importing docker image ...
pyxis: creating container filesystem ...
pyxis: starting container ...
/usr/local/bin/ffmpeg

You can go back to the previous behavior by modifying the plugstack config:

$ cat /etc/slurm/plugstack.conf.d/pyxis.conf 
required /usr/lib/x86_64-linux-gnu/slurm/spank_pyxis.so execute_entrypoint=true

But with the above, you would need to do the following:

$ srun --no-container-entrypoint --container-image jrottenberg/ffmpeg:3.2-alpine which ffmpeg
sfeltman commented 4 years ago

@flx42 thanks for the update. This will simplify my deployment and hopefully minimized support for you on the topic.

Regards