The default spark.gluten.sql.columnar.shuffle.sort.partitions.threshold value is too large that when we increase the number of partitions, hash-base shuffle is used and may cause OOM.
Job aborted due to stage failure: Task 603 in stage 6.0 failed 4 times, most recent failure: Lost task 603.3 in stage 6.0 (TID 14913) (node71-129-11-bdxs.qiyi.hadoop executor 487): ExecutorLostFailure (executor 487 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding physical memory limits. 4.7 GB of 4 GB physical memory used. Consider boosting spark.executor.memoryOverhead.
Description
The default
spark.gluten.sql.columnar.shuffle.sort.partitions.threshold
value is too large that when we increase the number of partitions, hash-base shuffle is used and may cause OOM.https://github.com/apache/incubator-gluten/blob/55671cbf7c8bcd945b97869110872bf949589438/shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala#L1007-L1013
Error: