flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

Single application with multiple slots #2141

Open trws opened 5 years ago

trws commented 5 years ago

An interesting jobspec question came up today, can we support running a single application on multiple slot types at the same time. I don't see why not necessarily but I don't think the jobspec currently supports this. The specific use-case was the desire to have one rank per GPU, with a core each, and an MPI rank that used the rest of the cores on each node but no GPU.

trws commented 5 years ago

I suppose the question is, should this be valid?

(forgive the pseudo-jobspec)

type: node
count: 15
with:
  - type: slot
     label: task
     with: GPU + 1 core
  - type: slot
     label: task
     with: CORES
dongahn commented 5 years ago

From the graph matching scheduling point of view, this will certainly make the algorithm much more complex. (I am not even sure if it is possible to do this with one-pass algorithm we use. This is something that one needs to sit down and tries to actually code it).

If the per-node requirement is given in aggregate, matching can be simpler of course and the matching service can generate an R that meets the overall requirement (with no fine grained slot info).

type: node
count: 15
with:
  - type: slot
    label: tasks
    with: GPU + 1 core + CORES

I don't know if such an R cab be further divvied up by the execution system to do the final mapping/binding...

Certainly a use case to think about.

Do apps people already have existing examples? Or is this theoretical? This probably requires them to write heterogenous MPI communitation patterns using groups and such...

trws commented 5 years ago

The question came from Erik Draeger, and apparently they’re running this way with jsrun right now (though with a great deal of pain to specify it). They aren’t doing MPI groups or anything else complicated, just doing load-balancing such that the rank using the CPU cores gets an appropriate amount of work. In terms of what sched actually sees, I suppose what it would get is basically that right? Does sched do anything special with the “slot” nodes in the jobspec or just ignore them?

dongahn commented 5 years ago

Does sched do anything special with the “slot” nodes in the jobspec or just ignore them?

It makes use of it. After doing a subtree walk, if the visiting vertex type is slot, the scheduler equally divides up the (sub)resources matched from the subtree walk into equal chunks and see if slot[num] is satisfied.

So far, what @SteVwonder and @grondo wanted was for the sched to encode this slot info into R. (Not RV1 but later). If slot is not embedded into the resource section, it will actually make my life a lot easier. But then this may kick the can (complexity) down to another layer -- execution system.

dongahn commented 5 years ago

s/visiting vertex type/vertex type in jobspec/

trws commented 5 years ago

I guess that makes sense. It feels like there may be a pretty way to fit support for something like (if not exactly) this into the existing traversals, might be good if we could chat at a whiteboard sometime and try to work it out.

dongahn commented 5 years ago

@trws: sounds good to me.

SteVwonder commented 5 years ago

might be good if we could chat at a whiteboard sometime and try to work it out.

Definitely! Sounds like a good 2pm coffee discussion 😄 ☕️