In cloud environments, it is a common requirement to be able to persist shuffle data outside of the node on which a Spark task is running. Since many workloads run on top of file systems which implement HDFS semantics (FileContext and FileSystem specifically), a storage plugin for these systems will be used to provide within the code base. This will also allow users of Spark 2.4 releases to use external shuffle storage which is HDFS compliant.
In cloud environments, it is a common requirement to be able to persist shuffle data outside of the node on which a Spark task is running. Since many workloads run on top of file systems which implement HDFS semantics (
FileContext
andFileSystem
specifically), a storage plugin for these systems will be used to provide within the code base. This will also allow users of Spark 2.4 releases to use external shuffle storage which is HDFS compliant.