MemVerge / splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Apache License 2.0
127 stars 29 forks source link

Add support for HDFS compliant file systems #64

Open drnushooz opened 4 years ago

drnushooz commented 4 years ago

In cloud environments, it is a common requirement to be able to persist shuffle data outside of the node on which a Spark task is running. Since many workloads run on top of file systems which implement HDFS semantics (FileContext and FileSystem specifically), a storage plugin for these systems will be used to provide within the code base. This will also allow users of Spark 2.4 releases to use external shuffle storage which is HDFS compliant.