camsas / firmament

The Firmament cluster scheduling platform
Apache License 2.0
412 stars 77 forks source link

Resource allocation of firmament #47

Open cxxly opened 8 years ago

cxxly commented 8 years ago

Hi:

As far as I know, there are two ways to allocate resource:

  1. Coarse granularity: Partition machine into fixed-size slots, and every slot can run one task, such as Hadoop.
  2. Fine-grained resource allocate like Brog. (Borg users request CPU in units of milli-cores, and memory and disk space in bytes)

I have seen that both your work and Quincy use constant integer K to represent the capacity of a machine, like coarse-grained allocate. But there are some fine-grained resource information in cost model.

I want to know

  1. How does firmament represent resource requested by a task and resource owned a machine ?
  2. what's the physical meaning of capacity and how do you get the value of K?
ms705 commented 8 years ago

Hi @cxxly,

Sorry for the delayed response -- I'm currently travelling. I'll respond in more detail a bit later.

The bottom line is this: Firmament does use "slots" in the sense that each running task "uses" a leaf of the resource topology (= a PU/CPU core). This makes it easy to implement slot-based allocation policies, but does not mean that you must use slots.

Instead, you can see the leaves as an upper limit on the number of tasks that can run on a machine, which can be greater than the number of CPU cores (just add another level, or make up some "fake" cores, or increase the per-leaf capacity K). A multi-dimensional resource fit model can then be implemented by connecting tasks appropriately to places where they can fit -- as done in CoCo.

To address your questions quickly:

  1. Firmament's job submission protobuf contains a resource reservation vector (see here), so it is currently user-specified. @joshbambrick did some excellent work to make Firmament automatically estimate resource requirements (using machine learning techniques to predict the initial reservation, and dynamic adaptation to tighten it), but this work isn't yet upstreamed.
  2. See above -- the number of slots (K) sets an upper limit on the number of tasks per machine (since K * num_cores gives the aggregate outgoing flow capacity for the machine). We use K = 1 in most code models, but you can set it higher if you want to allow time-sharing of CPU cores.

Hope that makes sense!

cxxly commented 8 years ago

Thanks! @ms705

I 'm interested in @joshbambrick work, is there any published paper I can learn.

And I have some question about admission control in COCO cost model, I will open a new issue.

ms705 commented 8 years ago

Hi @cxxly,

We're going to have a blog post on @joshbambrick's work soon; if you're interested in a longer writeup, his BA dissertation is available here.

Did you end up opening a new issue about the CoCo admission control questions? I don't see any, but I may have missed it while travelling.

cxxly commented 7 years ago

Hi @ms705

I‘m really sorry for the delayed response. I’m very busy nowadays for my graduation.

I will open a new issue later.