Open grondo opened 12 months ago
Some notes offline from @garlick:
Other notes found experimentally:
XDG_RUNTIME_DIR=/run/user/$UID
and DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus
may need to be exported to the job/run/flux
then chmod +x $(flux getattr rundir)
on all ranks will be necessaryIf (non-critical) nodes crash during a DAT, the DAT instance continues but we don't have a way to re-add them. #5184 (FLUB bootstrap) might be once piece of a solution...
This is a minimal proof-of-concept of launching a multi-user capable subinstance:
As user flux:
$ flux alloc -N2 --conf=access.allow-guest-user=true --conf=exec.imp=$(flux config get exec.imp)
$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)'
Then as another valid user on the system:
$ flux jobs -A
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒMDKKUWCR4B flux flux R 2 2 1.517m pi[3-4]
$ flux uptime
13:38:57 run 2.9m, owner flux, depth 1, size 2
$ flux run -N2 id
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)
For testing purposes, more complex configuration could be placed in a test.toml
provided to --conf
, e.g. to support the job prolog on a multi-user subinstance:
[job-manager]
plugins = [ { load = "perilog.so" } ]
[job-manager.prolog]
command = [
"flux",
"perilog-run",
"prolog",
"--timeout=10s",
"--with-imp",
"-e",
"prolog"
]
Then just add --conf=test.toml
to the command line above.
One potential issue is that if the subinstance were to use sdexec or housekeeping, I think there is a potential for name collision in the units at different instance levels.
BTW, there was a question about whether users would have access to the URI for child instances launched by the flux
user. Obviously, they do (because the above example just works). This is because the URI is set via a memo
event in the job's eventlog, which is in turn added to the job user
annotations available from the job-list
module. (This is also how jobs that are subinstances of flux appear in the color blue in the output of flux jobs
, when color is available)
This one is a bit of a head-scratcher. Seen in the prolog of a multiuser instance:
[flux@fluke2:flux]$ flux alloc -N2 --requires=host:fluke[131-132] --conf=/var/flux/hobbs.toml
[flux@fluke131:tmp]$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && chmod a+rx /tmp/flux/flux-x6McZ6/ && flux exec sh -c 'chmod uo+x /tmp/flux'
[flux@fluke131:tmp]$ flux alloc -N1 hostname
Aug 28 10:28:29.863952 PDT job-manager.err[0]: fBXJFMS7: prolog: stderr: fluke132 (rank 1): flux-job: Operation not permitted
fluke132
The error was triggered by calling flux job info $FLUX_JOB_ID jobspec
.
For the record, id
in the prolog shows:
2024-08-28T17:28:37.025461Z job-manager.info[0]: fBXJFMS7: epilog: stdout: uid=0(root) gid=755(flux) groups=755(flux),3172(iseepids)
Maybe the gid also needs to be root to run this operation?
This is similar to what we do in the system-instance prolog, except I'm not running --sdexec
under flux perilog-run
.
Oh, you might need to ensure that access.allow-root-owner
is set to true
since the prolog runs as root.
That did it, thanks @grondo!
For a final summary here is how to start a multiuser instance as a job under the instance owner of a system instance:
[flux@fluke2:~]$ cat /var/flux/conf.toml
# Test configuration file for launching multiuser Flux
# instance as a job
[access]
allow-guest-user = true
allow-root-owner = true
[job-manager]
plugins = [ { load = "perilog.so" } ]
[ingest.validator]
plugins = [ "jobspec" ]
[exec]
imp = "/usr/libexec/flux/flux-imp"
# Note you could add a job-manager.prolog and epilog here.
# This will require a separate imp configuration in a specific /etc
# directory and will overwrite previous imp configs for that node/
# system. Proceed with caution.
[flux@fluke2:~]$ flux alloc -N2 --conf=/var/flux/conf.toml
[flux@fluke131:~]$ flux resource list
STATE PROPERTIES NNODES NCORES NGPUS NODELIST
free batch 2 4 0 fluke[131-132]
allocated 0 0 0
down 0 0 0
Note the need for some directory permission mangling:
[flux@fluke131:~]$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && \
> chmod a+rx /tmp/flux/flux-*/ && flux exec sh -c \
> 'chmod uo+x /tmp/flux'
And then as another user on the system
(s=130,d=0) fluke2 ~ $ whoami
hobbs17
(s=130,d=0) fluke2 ~ $ flux jobs -u flux
JOBID QUEUE USER NAME ST NTASKS NNODES TIME INFO
fAMC368J3a7 batch flux flux R 2 2 54.42s fluke[131-132]
(s=130,d=0) fluke2 ~ $ flux proxy fAMC368J3a7
(s=2,d=1) fluke2 ~ $ flux run -N2 hostname
fluke131
fluke132
Correction: "will overwrite" in that comment is a bit of a misstatement.
The final TOML table that is read in alphabetical order from /etc/flux/imp/conf.d
can overwrite previous tables if there is a conflict, i.e. the run
table is defined in both a-imp.toml
and b-imp.toml
(b-imp.toml
's run
table would prevail). Even if the actual keys in the run
table in both config files do not have conflicting names.
It would be useful to be able to launch a subinstance in a system instance which is also capable of running multiuser jobs. This feature in combination with user-based access controls could offer one way implement dedicated access time.
This is a tracking issue to discuss the implementation and track any bugs that need to be fixed to get a basic implementation.