Closed Tagar closed 4 years ago
@Tagar We have tried to merge the persistent memory based shuffle manager to upstream, please find the patch here: https://github.com/apache/spark/pull/24322. There's no way to build native code in Spark, so we'll maintain Spark-PMoF as an external package for Spark.
@tanghaodong25 understood - thanks.
Perhaps post native code as a separate package/ dependency outside of Spark.
Core PySpark nowadays for example has a hard dependency on pyarrow
https://github.com/apache/spark/blob/master/python/setup.py#L221
which itself has native libraries ..
Another precedent is IntelMKL library / gfortran etc that can be installed to boost up Spark ML with native libraries.
Not sure if this was discussed, but is this possible to merge this work with upstream Spark? Or the plan is to continue to maintain
Spark-PMoF
as a separate project?Thank you