NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
263 stars 28 forks source link

Request for enhancement: provide a pyxis option to srun that can map ports from the "base os" to the container. #113

Closed rennich closed 1 year ago

rennich commented 1 year ago

Request for enhancement: provide a pyxis option to srun that can map ports from the "base os" to the container.

For example, customer code has tools which take a list of worker nodes as input. It then uses ssh-type tools to launch and manage worker processes on those worker nodes. [I am not a sysamin, but to the best of my knowledge] On a non-containerized cluster this would work fine. The user is given a set of nodes. The user is authorized on all those nodes. The master node/process can ssh to the worker nodes as needed to start/stop processes.

Currently it appears impossible for such codes to work on clusters mandating the user of containers without re-writing all of the customer code scripts. Using sbatch/srun to get a set of nodes, each running a container image (required since the base OS doesn't contain all the required packages to run customer code) the master python script on the master node cannot launch any work on worker nodes. The reason is that any ssh-type connections to the worker nodes will go to the worker node's base OS. In fact, if running the container as root, the necessary credentials to use ssh aren't present. If running the container as user (with --no-container-remap-root) credentials exist, but then the connection is only to the worker node's base OS.

There appear to be methods to redirect the ssh ports to the containers. Such as the docker run command:

 docker run -p 22:2220 my-image:latest

[Disclaimer - I don't yet know for sure this will work.] My understanding is that this would permit an appropriately credentialed (i.e. user) process to ssh into the container running on the named worker node. (This may require modifications to the .ssh-config files, etc. so that standard use of ssh/similar make use of the properly remapped port numbers by default, but I expect this can be done within the container [over which I have control] since the ssh commands are coming from within the container.)

So the request is for support for port remapping in sbatch or srun similar to what's done in the docker run command above.

flx42 commented 1 year ago

In order to "publish" a port, docker relies on a network namespace, and then extra setup depending on the network mode. When publishing ports with docker run -p in the default bridge mode, docker creates iptables rules to forward traffic to the container.

The process is complex and requires elevated privileges. While pyxis could acquire those privileges at one point in the SPANK plugin lifecycle, we want to run as much as possible as an unprivileged user (the user running the Slurm job) which would make it impossible to do this elaborate setup phase of manipulating iptables, setting up a bridge, or even binding to privileged port 22.

But pyxis doesn't create a network namespace, so "publishing" a port is not needed, Pyxis works like --network=host in docker. If your container starts a server that listens on port 8000, then you can access this server on port 8000 with the node IP, no extra step needed. However in pyxis, as a unprivileged user you can't bind to privileged ports (ports < 1024).

Running a ssh server inside a container and listening on an unprivileged port (e.g. 2220) should work, ssh'ing to the node will land you inside the container with ssh -p 2220 node0001. You might need to modify your existing python scripts however, to add a parameter that allows you specify the ssh port to use, instead of the default port 22 (which would land outside the container).

flx42 commented 1 year ago

I hope my explanations helped you understand that we should not need any additional feature in pyxis/enroot, closing for now.