Closed garlick closed 7 months ago
Couple of more thoughts:
flux_future_t
based interface is desirable to enable reactive programmingI hate to even mention this because I really like the idea of having this capability in Flux, but I think this problem could also be solved with mpifileutils or similar with "alloc bypass" from #3740, copying R from the target job. Two advantages of that approach are 1) uses RDMA, and 2) portability.
Following up on coffee call. It doesn't look like mpifileutils dbcast works like I thought it did. It appears to read stripes of a file from all ranks of a parallel job, not one rank. This is what @jameshcorbett was saying and I just wasn't getting it. Sorry about that!
Just something I was thinking about: Slurm's sbcast works on both job IDs and step IDs. The proposed implementation, as a job shell plugin, would work only on jobs and wouldn't have a way to broadcast a file to every node in a Flux instance. But you could go up a level in the Flux hierarchy and broadcast the file at that level, to the job that is the sub-instance. There wouldn't be a way to broadcast a file across a top-level Flux instance but I can't think of any use-cases for that.
To replicate the sbcast example in Flux:
$ cat my.job
#!/bin/bash
sbcast my.prog /tmp/my.prog
srun /tmp/my.prog
$ sbatch --nodes=8 my.job
srun: jobid 12345 submitted
You would need to be able to get the job ID of the encapsulating instance and the URI of the system instance. A little awkward, maybe...
The proposed implementation, as a job shell plugin, would work only on jobs and wouldn't have a way to broadcast a file to every node in a Flux instance
Most Flux instances are also jobs, but I think I understand what you are saying here: You can't broadcast a file to all nodes of your single-user enclosing instance (i.e. in most cases, your batch job) from within the instance.
You would need to be able to get the job ID of the encapsulating instance and the URI of the system instance. A little awkward, maybe...
Actually, this may not be too bad. Within an instance started under Flux, the environment variable FLUX_JOB_ID
will be set to the jobid of the current instance. The flux(1)
command driver also has a --parent
option which uses the URI of the parent instead of the current instance. So, if flux bcast
is the command to broadcast a file to all nodes of a job, your batch script could use:
flux --parent bcast $FLUX_JOB_ID /tmp/my.prog
Better yet, if a JOBID isn't provided with flux bcast
, maybe the utility could assume it is meant to run against the current job and will automatically use the current FLUX_JOB_ID
and grab the parent-uri
from the enclosing instance so it will work similarly to sbcast
:
flux bcast /tmp/my.prog
Most Flux instances are also jobs, but I think I understand what you are saying here.
The problem with all this infinitely hierarchical stuff is that it makes everything hard to talk about :(
Actually, this may not be too bad.
Great, I figured that there would be good ways of talking to the parent instance, but I didn't know what they were (or if they had already been implemented). I also really like your idea of the missing JOBID assumption.
Better yet, if a JOBID isn't provided with flux bcast, maybe the utility could assume it is meant to run against the current job and will automatically use the current FLUX_JOB_ID and grab the parent-uri from the enclosing instance.
I like the idea of looking for FLUX_JOB_ID
, but grabbing the parent-uri does have one drawback. If you executed something like flux mini run ... bash -c "flux bcast /tmp/my.prog; /tmp/my.prog"
, FLUX_JOB_ID
would be set to the bash
job (rather than the job ID of the current instance), so the combination of parent-uri
and FLUX_JOB_ID
would be all wrong.
Just a potential trade-off to be aware of. I can't think of any possible confusions from letting the job ID be implicit, though.
If you executed something like flux mini run ... bash -c "flux bcast /tmp/my.prog; /tmp/my.prog", FLUX_JOB_ID would be set to the bash job (rather than the job ID of the current instance)
Good point!
Though running flux bcast
in this way should perhaps be avoided because:
flux mini run
specifies multiple tasks you'll be running flux bcast
multiple times simultaneouslyflux mini run
only specifies one task then you are running flux bcast
to copy a file to itself on the local nodeIt would be nice if we had a way to detect this situation and issue a meaningful error. :thinking:
If you wanted to run flux bcast
as a job, e.g. to use it as part of a workflow, then you could use the FLUX_JOB_ID
from the environment at the time of submission, and specifically use --parent
, though that isn't so user-friendly:
flux mini submit flux --parent bcast $FLUX_JOB_ID
Sounds like it would be good if we had a way to determine if the current process is in an initial program i.e. batch script, or part of a job. In the 2nd case you could maybe issue an error if JOBID isn't provided.
@JaeseungYeom do you think you could leverage some of your DYAD work for this?
Closing this. We can open issues against flux-archive(1)
if there are still things missing.
Following up on a coffee time discussion with @jameshcorbett
There is a need for an C API for copying file(s) to a job. (What other requirements are there?)
Some notes from the discussion
flux mini
command line)Additional thoughts:
src/common/libutil/kary.h
provides some helper function for determining virtual TBON peers etc.