ExaESM-WP4 / Batch-scheduler-Singularity-bindings

adding batch scheduler command functionality to Singularity containers
MIT License
4 stars 0 forks source link

Implement also an SSH-based approach for improved base image compatibility? #4

Open kathoef opened 3 years ago

kathoef commented 3 years ago

When implementing the configuration for the new NESH system I was a bit "shocked" about the small set of base images that remained "compatible" with the host-system SLURM libraries. (I think, the situation is similar for e.g. JUWELS now, please note, the table from the README is a bit outdated.) One idea would be to also implement the SSH-based approach @willirath came up with into the bind-scheduler.sh tool. A few (not well-structured) thoughts,

The only requirement on the container environment side would be a working SSH client (one could bind mount the host system's client, but that would cause the same glibc-version conflicts we are trying to avoid here). Additionally, a passwordless (for comfort reasons...) local SSH key pair would have to be present.

(Or one could rely on agent forwarding? I guess, there are also credential timeout options for SSH that could be used comfortably with a local password-protected key pair? Which of these options should be preferred and which would also work reliably from the compute nodes?)

Finally, either shell aliases of the form alias sinfo='ssh $(hostname) sinfo "$@"' or the "wrapper script in a known path location"-approach shown here could be used to forward batch scheduler commands from the container to the host system. (I guess the wrapper script approach is more straightforward to bind into the container environment.)

Any thoughts, @kthust, @willirath, and @martinclaus? (Please note, this is more about coming up with a good design choice, than an attempt to come up with an SSH-based solution that works on any HPC system, which I know, won't likely be possible.)

willirath commented 3 years ago

Just so we don't forget: I think @krausedfzj suggest exploring bind-mounting unix domain sockets which forward stdin to something then calling the batch-scheduler CLI.