NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

Pyxis use ENROOT_DATA_PATH and other enroot configs overrides via srun #43

Closed avolkov1 closed 3 years ago

avolkov1 commented 3 years ago

I want to override the ENROOT_DATA_PATH via srun command using pyxis. Currently pyxis seems to ignore environment variable ENROOT_DATA_PATH and only rely on what is set in /etc/enroot/enroot.conf.

Example:

$ export ENROOT_DATA_PATH="/var/tmp/enroot-data/user-$(id -u)"
$ srun --export=ENROOT_DATA_PATH -N 1 --ntasks=2 --gpus-per-task=1  --cpus-per-task=16 --gres-flags=enforce-binding \
  --container-name=tf2_nvcr21.04 \
  --container-image ~/enroot_images/nvidia+tensorflow+21.04-tf2-py3.sqsh \
  --pty bash

On the node the container is unpacked to:

/tmp/enroot-data/user-150020/tf2_nvcr21.04

Instead of:

/var/tmp/enroot-data/user-150020/tf2_nvcr21.04

If using enroot directly without pyxis the ENROOT_DATA_PATH is used correctly and the container is unpacked to:

/var/tmp/enroot-data/user-150020/tf2_nvcr21.04

But not with pyxis.

Is there a way to have pyxis respect the configuration environment for enroot (enroot configuration)?

flx42 commented 3 years ago

Hi @avolkov1, today only ENROOT_CONFIG_PATH is passed from the user environment to enroot (through pyxis). The goal was to have pyxis always use the admin defaults for most settings. We want to avoid container import / creation issues caused by users setting those environment variables.

So, the recommendation would be to either use enroot directly, or ask the cluster admin to modify enroot.conf. Does that work for you?

avolkov1 commented 3 years ago

Thank you. Yes, that's fine.

Is it possible to override enroot.conf in the ENROOT_CONFIG_PATH via customized enroot.conf (where I would modify ENROOT_DATA_PATH)? I want to have ~/.config/enroot/enroot.conf that would take precedence over /etc/enroot/enroot.conf.

flx42 commented 3 years ago

That's not possible, right @3XX0?

3XX0 commented 3 years ago

Unfortunately no, not right now at least

avolkov1 commented 3 years ago

Ok, thanks.