Closed bplein closed 6 years ago
Support for mounting PVCs and other types of volumes is being worked on in https://github.com/apache/spark/pull/21260. The title of the PR says hostPath, but the implementation is generic and works with other types of volumes too. Note that we moved development into upstream.
Thanks, I'll close this and follow the action upstream.
@liyinan926 , Is there a config example for a PV or PVC? I downloaded and compiled the commits in apache #2600; want to try this in my environment.
According to the JIRA, it's:
spark.kubernetes.executor.volumes=hostPath:containerPath[:ro|rw]
for hostPath, but what about PV?
I work for Diamanti, a company that has a Kubernetes appliance featuring high performance storage and networking, both offering QOS that is accessible to the developer via K8s podspec.
In attempting to help a customer use Spark on Kubernetes, we're running into the issue of not being able to control the type of storage used by Spark jobs.
I've reviewed several past (open and closed) issues reported in this repository, and they all seem to be single use case examples (and fixes), such as for emptydir or hostpath support.
Shouldn't Spark-on-K8s generically support PVs/PVCs and other K8s ways of having temporary or persistent storage instead of single fixes for hostpath, emptydir etc?
Hosts in general and K8S clusters in particular have multiple classes of storage available to them. K8s clusters have a plethora of persistent storage options including FlexVolume (today) and CSI (to replace FlexVolume going forward).
With regards to Spark, wouldn't it be ideal if temporary files could be directed to the lowest latency highest throughput storage available to the cluster?
I would like to see if we could use the spark-submit or any other method to describe K8S volumes or PVCs so that users of Spark-on-K8s could use the storage best suited for the performance and capacity needs of their applications.