Open trws opened 5 years ago
I suppose the question is, should this be valid?
(forgive the pseudo-jobspec)
type: node
count: 15
with:
- type: slot
label: task
with: GPU + 1 core
- type: slot
label: task
with: CORES
From the graph matching scheduling point of view, this will certainly make the algorithm much more complex. (I am not even sure if it is possible to do this with one-pass algorithm we use. This is something that one needs to sit down and tries to actually code it).
If the per-node requirement is given in aggregate, matching can be simpler of course and the matching service can generate an R that meets the overall requirement (with no fine grained slot info).
type: node
count: 15
with:
- type: slot
label: tasks
with: GPU + 1 core + CORES
I don't know if such an R cab be further divvied up by the execution system to do the final mapping/binding...
Certainly a use case to think about.
Do apps people already have existing examples? Or is this theoretical? This probably requires them to write heterogenous MPI communitation patterns using groups and such...
The question came from Erik Draeger, and apparently they’re running this way with jsrun right now (though with a great deal of pain to specify it). They aren’t doing MPI groups or anything else complicated, just doing load-balancing such that the rank using the CPU cores gets an appropriate amount of work. In terms of what sched actually sees, I suppose what it would get is basically that right? Does sched do anything special with the “slot” nodes in the jobspec or just ignore them?
Does sched do anything special with the “slot” nodes in the jobspec or just ignore them?
It makes use of it. After doing a subtree walk, if the visiting vertex type is slot
, the scheduler equally divides up the (sub)resources matched from the subtree walk into equal chunks and see if slot[num] is satisfied.
So far, what @SteVwonder and @grondo wanted was for the sched to encode this slot info into R. (Not RV1 but later). If slot is not embedded into the resource section, it will actually make my life a lot easier. But then this may kick the can (complexity) down to another layer -- execution system.
s/visiting vertex type/vertex type in jobspec/
I guess that makes sense. It feels like there may be a pretty way to fit support for something like (if not exactly) this into the existing traversals, might be good if we could chat at a whiteboard sometime and try to work it out.
@trws: sounds good to me.
might be good if we could chat at a whiteboard sometime and try to work it out.
Definitely! Sounds like a good 2pm coffee discussion 😄 ☕️
An interesting jobspec question came up today, can we support running a single application on multiple slot types at the same time. I don't see why not necessarily but I don't think the jobspec currently supports this. The specific use-case was the desire to have one rank per GPU, with a core each, and an MPI rank that used the rest of the cores on each node but no GPU.