apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Mount emptyDir volumes for temporary directories on executors in static allocation mode (rebased) #522

Closed mccheah closed 7 years ago

mccheah commented 7 years ago

Rebased version of #486.

Closes #439.

This is extremely important for performance, especially in shuffle-heavy computations where the executors perform a large amount of disk I/O. We only provision these volumes in static allocation mode without using the shuffle service because using a shuffle service requires mounting hostPath volumes, instead.

mccheah commented 7 years ago

I created this PR so that I didn't have to overwrite the original branch from #486. The rebase was pretty tricky so I want to keep the old history around just in case.

mccheah commented 7 years ago

Tests are broken because ExecutorPodFactory doesn't consider the mounted empty dirs. I'll work on fixing that.

ash211 commented 7 years ago

@mccheah ready to review? Any particular parts to focus on?

mccheah commented 7 years ago

still haven't fixed the tests yet

mccheah commented 7 years ago

@ash211 @kimoonkim ready for review again. I changed the architecture a bit to make testing easier.

ash211 commented 7 years ago

(killed the integration test run on the intermediate commit to save time)

amadav commented 6 years ago

Are there plans to merge this to spark master?