launch a multiuser instance as a job

grondo commented 12 months ago

It would be useful to be able to launch a subinstance in a system instance which is also capable of running multiuser jobs. This feature in combination with user-based access controls could offer one way implement dedicated access time.

This is a tracking issue to discuss the implementation and track any bugs that need to be fixed to get a basic implementation.

### Tasks
- [ ] #5530

grondo commented 12 months ago

Some notes offline from @garlick:

It might be nice if the DAT local socket ended up in /run/flux like the system instance one but wtih a different name. Then we wouldn't have to audit security for the other things in the rundir.
need to disable the doom timeout
might want to launch the instance with a large fanout similar to system instance
would be nice to have a new frontend command for this purpose which encodes all the above

Other notes found experimentally:

For systemd/sdbus support in the DAT instance, XDG_RUNTIME_DIR=/run/user/$UID and DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus may need to be exported to the job
if connector socket is not moved to /run/flux then chmod +x $(flux getattr rundir) on all ranks will be necessary

garlick commented 12 months ago

If (non-critical) nodes crash during a DAT, the DAT instance continues but we don't have a way to re-add them. #5184 (FLUB bootstrap) might be once piece of a solution...

grondo commented 2 months ago

This is a minimal proof-of-concept of launching a multi-user capable subinstance:

As user flux:

$ flux alloc -N2 --conf=access.allow-guest-user=true --conf=exec.imp=$(flux config get exec.imp)
$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)'

Then as another valid user on the system:

$ flux jobs -A
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
 ƒMDKKUWCR4B flux     flux        R      2      2   1.517m pi[3-4]
$ flux uptime
 13:38:57 run 2.9m,  owner flux,  depth 1,  size 2
$ flux run -N2 id
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)

For testing purposes, more complex configuration could be placed in a test.toml provided to --conf, e.g. to support the job prolog on a multi-user subinstance:

[job-manager]
plugins = [ { load = "perilog.so" } ]

[job-manager.prolog]
command = [
  "flux",
  "perilog-run",
  "prolog",
  "--timeout=10s",
  "--with-imp",
  "-e",
  "prolog"
]

Then just add --conf=test.toml to the command line above.

One potential issue is that if the subinstance were to use sdexec or housekeeping, I think there is a potential for name collision in the units at different instance levels.

grondo commented 2 months ago

BTW, there was a question about whether users would have access to the URI for child instances launched by the flux user. Obviously, they do (because the above example just works). This is because the URI is set via a memo event in the job's eventlog, which is in turn added to the job user annotations available from the job-list module. (This is also how jobs that are subinstances of flux appear in the color blue in the output of flux jobs, when color is available)

wihobbs commented 2 months ago

This one is a bit of a head-scratcher. Seen in the prolog of a multiuser instance:

[flux@fluke2:flux]$ flux alloc -N2 --requires=host:fluke[131-132] --conf=/var/flux/hobbs.toml
[flux@fluke131:tmp]$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && chmod a+rx /tmp/flux/flux-x6McZ6/ && flux exec sh -c 'chmod uo+x /tmp/flux'
[flux@fluke131:tmp]$ flux alloc -N1 hostname
Aug 28 10:28:29.863952 PDT job-manager.err[0]: fBXJFMS7: prolog: stderr: fluke132 (rank 1): flux-job: Operation not permitted
fluke132

The error was triggered by calling flux job info $FLUX_JOB_ID jobspec.

For the record, id in the prolog shows:

2024-08-28T17:28:37.025461Z job-manager.info[0]: fBXJFMS7: epilog: stdout: uid=0(root) gid=755(flux) groups=755(flux),3172(iseepids)

Maybe the gid also needs to be root to run this operation?

This is similar to what we do in the system-instance prolog, except I'm not running --sdexec under flux perilog-run.

grondo commented 2 months ago

Oh, you might need to ensure that access.allow-root-owner is set to true since the prolog runs as root.

wihobbs commented 2 months ago

That did it, thanks @grondo!

wihobbs commented 2 weeks ago

For a final summary here is how to start a multiuser instance as a job under the instance owner of a system instance:

[flux@fluke2:~]$ cat /var/flux/conf.toml
# Test configuration file for launching multiuser Flux
# instance as a job

[access]
allow-guest-user = true
allow-root-owner = true

[job-manager]
plugins = [ { load = "perilog.so" } ]

[ingest.validator]
plugins = [ "jobspec" ]

[exec]
imp = "/usr/libexec/flux/flux-imp"

# Note you could add a job-manager.prolog and epilog here.
# This will require a separate imp configuration in a specific /etc
# directory and will overwrite previous imp configs for that node/
# system. Proceed with caution.

[flux@fluke2:~]$ flux alloc -N2 --conf=/var/flux/conf.toml
[flux@fluke131:~]$ flux resource list
     STATE PROPERTIES NNODES NCORES NGPUS NODELIST
      free batch           2      4     0 fluke[131-132]
 allocated                 0      0     0
      down                 0      0     0

Note the need for some directory permission mangling:

[flux@fluke131:~]$     flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && \
>       chmod a+rx /tmp/flux/flux-*/ && flux exec sh -c \
>       'chmod uo+x /tmp/flux'

And then as another user on the system

(s=130,d=0)  fluke2 ~ $ whoami
hobbs17
(s=130,d=0)  fluke2 ~ $ flux jobs -u flux
       JOBID QUEUE    USER     NAME       ST NTASKS NNODES     TIME INFO
 fAMC368J3a7 batch    flux     flux        R      2      2   54.42s fluke[131-132]
(s=130,d=0)  fluke2 ~ $ flux proxy fAMC368J3a7
(s=2,d=1)  fluke2 ~ $ flux run -N2 hostname
fluke131
fluke132

wihobbs commented 2 weeks ago

Correction: "will overwrite" in that comment is a bit of a misstatement.

The final TOML table that is read in alphabetical order from /etc/flux/imp/conf.d can overwrite previous tables if there is a conflict, i.e. the run table is defined in both a-imp.toml and b-imp.toml (b-imp.toml's run table would prevail). Even if the actual keys in the run table in both config files do not have conflicting names.

flux-framework / flux-core

launch a multiuser instance as a job #5531