Closed gaffer01 closed 2 years ago
Spark 2.3 introduced a repartitionByRange option on dataframes. This could be used to improve the efficiency of SortFullGroup in the Parquet store (possibly avoiding the need to use RDDs, which could significantly improve the efficiency).
repartitionByRange
SortFullGroup
This requires #1902 to be merged first.
We are not supporting parquet in v2.0, so this will not be done.
Spark 2.3 introduced a
repartitionByRange
option on dataframes. This could be used to improve the efficiency ofSortFullGroup
in the Parquet store (possibly avoiding the need to use RDDs, which could significantly improve the efficiency).This requires #1902 to be merged first.