NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
266 stars 30 forks source link

Option to make container image writable? #52

Closed kcgthb closed 3 years ago

kcgthb commented 3 years ago

Hi!

I know there are options to mount $HOME (or not), or to remap the user to root inside the container (or not), but is there a way to make the container writable through Pyxis (like with enroot start --rw)?

With the default enroot configuration, container images are read-only (and the --container-name example in the wiki doesn't work):

$ srun --container-image=debian --container-remap-root sh -c 'touch /foobar && ls -al /foobar'
pyxis: importing docker image ...
touch: cannot touch '/foobar': Read-only file system
srun: error: sh02-01n60: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=25864762.18

It seems like the way to make containers writable is to set ENROOT_ROOTFS_WRITABLE=1, either in enroot.conf (which is probably not great as that would be the default for every invocation), or, with #48, from the calling environment, but that may not be the most elegant either:

$ ENROOT_ROOTFS_WRITABLE=1 srun --container-image=debian --container-remap-root  sh -c 'touch /foobar && ls -al /foobar'
pyxis: importing docker image ...
-rw-r--r-- 1 root root 0 Jun  4 16:23 /foobar

Unless I've missed another way to do this, do you think a new --[no-]container-rw option could be possible?

Thanks!

flx42 commented 3 years ago

It would be simple to add, but you're sure you don't want to set ENROOT_ROOTFS_WRITABLE by default in enroot.conf? :)

That's what we do for all our clusters, but advanced users can still get a read-only container through enroot if they want to.

kcgthb commented 3 years ago

but you're sure you don't want to set ENROOT_ROOTFS_WRITABLE by default in enroot.conf? :)

Oh boy, that would be... interesting. That would be a sure way to kiss goodbye to even the slightest appearance of reproducibility, and sure make an endless source of support reuqests asking why the container doesn't behave like it did last week. :)

I see your point, but I think having non-writable containers by default (which is what enroot provides if I'm not mistaken since the default value of ENROOT_ROOTFS_WRITABLE is no) is a much safer stance, at least for our user population. Having the possibility to make containers writable would be great, but if it requires user to pass an optional argument and make that a conscious decision, I think it would be best.

That's what we do for all our clusters, but advanced users can still get a read-only container through enroot if they want to.

I'm actually leaning towards the opposite way: having containers read-only for most users (who are just consumers of the containers, most of the time), but let advanced users execute read-write containers when they need to modify them.

flx42 commented 3 years ago

Oh boy, that would be... interesting. That would be a sure way to kiss goodbye to even the slightest appearance of reproducibility, and sure make an endless source of support reuqests asking why the container doesn't behave like it did last week. :)

I'm not sure I understand this point. Why would it impact reproducibility? The container filesystem is usually ephemeral (tied to the lifetime of the Slurm job), so new Slurm job next week will start from the same initial state. Or are you planning to keep container state across jobs?

kcgthb commented 3 years ago

I guess that if containers are writable by default, users will keep modifying them, maybe even not knowingly, and end up with containers that change over time.

The container filesystem is usually ephemeral (tied to the lifetime of the Slurm job), so new Slurm job next week will start from the same initial state.

I was under the impression that the default Pyxis configuration was to keep containers across jobs, wasn't it? From https://github.com/NVIDIA/pyxis/wiki/Setup#slurm-plugstack-configuration:

container_scope, controls whether named containers persist across Slurm jobs. When set to the value job, pyxis will automatically cleanup named containers in the job epilog. When set to global (default), named containers can be reused by future jobs, unless they are manually removed by a custom Slurm epilog script.

flx42 commented 3 years ago

I was under the impression that the default Pyxis configuration was to keep containers across jobs, wasn't it?

Yes, it is. But in our case we do the cleanup in a Slurm epilog. We had this epilog before the container_scope option was added, and the goal of this option was to replace the need for the admin to add a cleanup epilog script like we have. The default of container_scope is global to avoid changing the default behavior when this option was introduced. I think we could change the default to job in a future release.

Anyway, the request makes sense so I'll add new command-line options. But I do recommend to start with container_scope=job (or a Slurm epilog equivalent), and then switch to container_scope=global only if you have a compelling use case for reusing the container filesystem across jobs.

flx42 commented 3 years ago

Done!

We have enough material for releasing v0.11.0 now, I'll just give it a bit of soak time before tagging the new release.

kcgthb commented 3 years ago

Thanks @flx42! That looks perfect.