NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
263 stars 28 forks source link

Logging in as another user #136

Open calvinp0 opened 2 months ago

calvinp0 commented 2 months ago

Hey, I am relatively new to enroot and pyxis and utilising srun, so apologies in advance

I am trying to use a container based upon micromamba container from here.

I want to log in as the user mambauser that is already set in the Docker image as logging in as this user will automatically activate the micromamba environment. Usually in Docker I would pass -u mambauser in the command line.

I tried to understand the Pyxis documentation, but also realised that our servers are not using the latest Pyxis as --container-env is not available. So I tried to use the --export function and receive this error:

calvin.p@dgx-master:~/work$ srun -p mig -G 0      --container-image=/home/calvin.p/deepchem-cuda.sqsh      --container-mounts=/home/calvin.p/work/CMPNN_HydrogenAbstraction:/home/mambauser/deepchem      --container-save=/home/calvin.p/deepchem-cuda-modified.sqsh     --container-entrypoint --no-container-remap-root --export ENV=deepchem_cuda,USER=mambauser   --pty bash
slurmstepd: error: pyxis: child 2964013 failed with error code: 1
slurmstepd: error: pyxis: failed to create container filesystem
slurmstepd: error: pyxis: printing contents of log file ...
slurmstepd: error: pyxis:     /usr/bin/enroot: line 44: HOME: unbound variable
slurmstepd: error: pyxis:     /usr/bin/enroot: line 44: HOME: unbound variable
slurmstepd: error: pyxis:     mkdir: cannot create directory ‘/home/mambauser’: Permission denied
slurmstepd: error: pyxis: couldn't start container
slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd: error: Failed to invoke spank plugin st

So is there anyway that I can log in as mambauser?

flx42 commented 2 months ago

The error you are facing with --export is because --export in Slurm is confusing, see this: https://github.com/NVIDIA/pyxis/wiki/Frequently-asked-questions#why-am-i-getting-errors-when-using-the---export-argument-of-slurm

calvinp0 commented 2 months ago

The error you are facing with --export is because --export in Slurm is confusing, see this: https://github.com/NVIDIA/pyxis/wiki/Frequently-asked-questions#why-am-i-getting-errors-when-using-the---export-argument-of-slurm

Ah, I see, thanks for that. So even if I attempt to add 'ALL' in front of the variables in the --export section, it will not act like it does in Docker?

flx42 commented 2 months ago

It should be close to what docker is doing. But in your case, even if you fix the export issue, it will probably still fail if the container entrypoint is attempting to do things that are not possible under pyxis/enroot.

calvinp0 commented 2 months ago

Yeah, I did attempt

srun -p mig -G 0 \
     --container-image=/home/calvin.p/deepchem-cuda.sqsh \
     --container-mounts=/home/calvin.p/work/CMPNN_HydrogenAbstraction:/home/mambauser/deepchem \
     --container-save=/home/calvin.p/deepchem-cuda-modified.sqsh \
     --container-entrypoint \
     --no-container-remap-root \
     --export ALL,ENV=deepchem_cuda,USER=mambauser \
     --pty bash

But that lead me to logging in as my current username on the server, calvin.p and the conda environment hadn't been initialized.

In the end, the workaround I am using is:

srun -p mig -G 0 \
     --container-image=/home/calvin.p/deepchem-cuda.sqsh \
     --container-mounts=/home/calvin.p/work/CMPNN_HydrogenAbstraction:/home/mambauser/deepchem \
     --container-save=/home/calvin.p/deepchem-cuda-modified.sqsh \
     --container-entrypoint \
     --container-remap-root \
     --export ENV=deepchem_cuda \
     --pty bash -c "su - mambauser"

This way, even though it shows me as root in whoami, I have the conda environment initialized and it starts in me in /home/mambauser/ directory.

Thanks for the help!