ArchieGertsman / spark-sched-sim

A Gymnasium environment for simulating job scheduling in Apache Spark
MIT License
24 stars 2 forks source link

spark-sched-sim

An Apache Spark job scheduling simulator, implemented as a Gymnasium environment.

Two Gantt charts comparing the behavior of different job scheduling algorithms. In these experiments, 50 jobs are identified by unique colors and processed in parallel by 10 identical executors (stacked vertically). Decima achieves better resource packing and lower average job completion time than Spark's fair scheduler.

What is job scheduling in Spark?

Why this simulator?


This repository is a PyTorch Geometric implementaion of the Decima codebase, adhering to the Gymnasium interface. It also includes enhancements to the reinforcement learning algorithm and model design, along with a basic PyGame renderer that generates the above charts in real time.

Enhancements include:


After cloning this repo, please run pip install -r requirements.txt to install the project's dependencies.

To start out, try running examples via examples.py --sched [fair|decima]. To train Decima from scratch, modify the provided config file config/decima_tpch.yaml as needed, then provide the config to train.py -f CFG_FILE.