apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

SparkJob scheduler for Kubernetes #157

Open foxish opened 7 years ago

foxish commented 7 years ago

extending thoughts in https://github.com/apache-spark-on-k8s/spark/issues/133#issuecomment-282371564

Currently, we schedule pods in k8s, but in the eventual state we want to get to, we should be able to reserve resources and schedule entire SparkJobs (where the resource requirement of the SparkJob is the sum of the minimum resources requested by its individual components). The first step to explore this idea would be to write a custom scheduler for SparkJobs which exhibits this particular behavior. The scheduler can fit in under the multi-scheduler paradigm.

iyanuobidele commented 7 years ago

I like where this is going. Luckily, I have a working prototype of this I wrote a couple months ago. It runs as a pod and essentially uses the special scheduler pod annotation to pick pods and assign them to bestfit (based on some heuristics) nodes. We can even customize this even further. If this is something we might like to explore, I'll talk to the right channels and see if it's something we can share with the group.

iyanuobidele commented 7 years ago

/cc @ccarrizo @khrisrichardson

mccheah commented 7 years ago

How would this work with dynamic allocation, where we would prefer that the job is as elastic as possible? I suppose in the dynamic allocation case we would like to be able to also scale the reserved amount. Or, in this case, are we only reserving the minimum resource requirement?