ArchieGertsman / spark-sched-sim

A Gymnasium environment for simulating job scheduling in Apache Spark
MIT License
20 stars 2 forks source link

What is the meaning of "commitment" int the code ? #5

Closed oximi123 closed 7 months ago

oximi123 commented 7 months ago

It's really a wonderful work! May I ask what's the meaning of the term "commitment" ? I see lots of expressions related to it in executor_tracker.py. I am new to Spark and dag scheduling.

ArchieGertsman commented 7 months ago

Thanks for stopping by! I have adopted the term "commitment" from the original Decima implementation. We say that an executor is committed to a stage if there is a plan to attach it to the stage in the future. In the current moment, the executor might be busy or the stage's dependencies may not all be complete, so the executor cannot be attached immediately - but it can be committed. Once the attachment becomes possible, the commitment gets realized, without having to consult the scheduler again. The scheduler's decisions are commitments - it first decides on a stage, then decides how many executors to commit to that stage.

Not all commitments get fulfilled. For example, imagine that an executor which was previously committed to some stage becomes available, and suppose that the stage has already completed. That executor is no longer needed for that stage, so the environment tries to find a backup stage.

oximi123 commented 7 months ago

Thanks for your reply. I guess I've understood this process.