job-shell: add plugin to run user-level prologue and epilogue scripts

SteVwonder commented 4 years ago

Per a conversation on Slack with @grondo and @dongahn:

Just to confirm, we don't have epilogue script support currently (just a placeholder), but we do have post-task support in job shell plugins, right?

The epilog script would run as root, so we need IMP support

> we could trivially add a plugin to run a script as the user (could even be done directly in the initrc)

grondo commented 4 years ago

What is the use case for user prolog/epilog? (i.e. is there a better way to do what the user's need?)

SteVwonder commented 4 years ago

That's a great question. TBH, I missed the full use-case. Something related to tools cleanup. Maybe @dongahn can summarize the use case better than me.

jameshcorbett commented 3 years ago

Olaf Faaland wants to make use of prologue and epilogue scripts on Elmerfudd to:

(1) Run a script to clean up /dev/shm after a job, so that a user who writes data there doesn't reduce the amount available to the next user.

(2) Drop caches after a job, e.g. echo 3 >/proc/sys/vm/drop_caches

My interest in this is primarily to ensure that data and metadata written to a remote file system such as Lustre is flushed to disk before the node is made available to other users. This is partially so that we find out about a problem as early as possible and minimize damage done, and partially so that one user can't hurt the following user's performance.

(3) Run a script to set up and destroy a local ephemeral file systems, for use by the user.

One example is connecting to remote NVME via nvme-over-fabrics, formatting the connected device with a file system such as xfs, and setting permissions so that the user can write to it; and then un-doing that after the job is complete.

(4) Run a script to set up and destroy a shared ephemeral file systems, for use by the user.

Another example is setting up and destroying a shared GFS2 file system. Unlike the local file system setup/destroy case, this would likely need to know the set of nodes participating in the job.

ofaaland commented 3 years ago

Olaf Faaland wants to make use of prologue and epilogue scripts on Elmerfudd to:

(1) Run a script to clean up /dev/shm after a job, so that a user who writes data there doesn't reduce the amount available to the next user. (2) Drop caches after a job, e.g. echo 3 >/proc/sys/vm/drop_caches My interest in this is primarily to ensure that data and metadata written to a remote file system such as Lustre is flushed to disk before the node is made available to other users. This is partially so that we find out about a problem as early as possible and minimize damage done, and partially so that one user can't hurt the following user's performance. (3) Run a script to set up and destroy a local ephemeral file systems, for use by the user. One example is connecting to remote NVME via nvme-over-fabrics, formatting the connected device with a file system such as xfs, and setting permissions so that the user can write to it; and then un-doing that after the job is complete. (4) Run a script to set up and destroy a shared ephemeral file systems, for use by the user. Another example is setting up and destroying a shared GFS2 file system. Unlike the local file system setup/destroy case, this would likely need to know the set of nodes participating in the job.

I'm absolutely open to other/better ways to accomplish those tasks.

grondo commented 3 years ago

Unfortunately a job-shell plugin won't work for any of these use cases since it runs as the user of the job, not a privileged process.

We do have support in the IMP (setuid helper) for job prolog and epilog which run as root, but the exec system doesn't have support for invoking the prolog/epilog yet, since that was waiting until the Big Rewrite:tm: #3346.

If this is high priority, it might just be a couple days to a week of work to support prolog/epilog in the current job-exec module.

ofaaland commented 3 years ago

Unfortunately a job-shell plugin won't work for any of these use cases since it runs as the user of the job, not a privileged process.

We do have support in the IMP (setuid helper) for job prolog and epilog which run as root, but the exec system doesn't have support for invoking the prolog/epilog yet, since that was waiting until the Big Rewrite™️ #3346.

If this is high priority, it might just be a couple days to a week of work to support prolog/epilog in the current job-exec module.

We could perhaps work around the "job-shell plugin runs as a user" issue with some creating sudo.d and scripting, but I wonder if prolog/epilog support isn't important for other testing.

Addressing those use cases somehow is definitely required for us to use elmerfudd with flux.

grondo commented 3 years ago

@ofaaland @jameshcorbett - I moved the discussion over to #2205 since these use cases require full prolog/epilog support and this issue is about a "user prolog/epilog"

flux-framework / flux-core

job-shell: add plugin to run user-level prologue and epilogue scripts #3155