Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
I'm seeing the default value of 0.10 fail for even reasonably-sized shuffle
jobs so expect this value to require some tuning to reliably succeed.
We copied this default value from YARN but it appears that kubernetes is more
strict on enforcing memory limits on containers than YARN has been: I have
two identically configured clusters of five AWS r3.4xls, one running YARN and
the other running kubernetes, with identical driver/executor settings, running
identical jobs, and the YARN job succeeds whereas the k8s job fails due to the
pod exceeding its memory limit.
To be consistent with YARN, maybe we should do memoryOverhead instead. The memory factor would make the computed value depend on another argument, driverMemory, which isn't ideal IMO.
I'm seeing the default value of 0.10 fail for even reasonably-sized shuffle jobs so expect this value to require some tuning to reliably succeed.
We copied this default value from YARN but it appears that kubernetes is more strict on enforcing memory limits on containers than YARN has been: I have two identically configured clusters of five AWS r3.4xls, one running YARN and the other running kubernetes, with identical driver/executor settings, running identical jobs, and the YARN job succeeds whereas the k8s job fails due to the pod exceeding its memory limit.