apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.2k stars 434 forks source link

[VL] Columnar shuffle write byte increase as the number of partitions increases #7696

Open j7nhai opened 1 week ago

j7nhai commented 1 week ago

Backend

VL (Velox)

Bug description

When run ssb-q4.2 with scale 100T and enable columnar shuffle writes, we found that shuffle write byte added up of all stages increase as the number of partitions increases. However, when disable gluten, the growth trend of vanilla spark is not so obvious.

spark.sql.adaptive.coalescePartitions.initialPartitionNum=k

The following table shows the shuffle write bytes sum by all stages.

k=1000 k=2000 k=4000 k=8000 k=16000
enable gluten (with columnar shuffle) 25569016307 27297986268 29890895841 37058299593 40684916742
disable gluten 30355717816 30463886189 30595400121 31204187443 32796457022

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

FelixYBW commented 1 week ago

Did you enable sort based shuffle? Hash shuffle has this issue.

j7nhai commented 1 week ago

Did you enable sort based shuffle? Hash shuffle has this issue.

Is it a bug and will be fixed in the future? May I know the reason why the disk is rising?

I didn't see any config to force enable sort based shuffle to avoid hash shuffle. Could I just decrease the value of spark.gluten.sql.columnar.shuffle.sort.columns.threshold

FelixYBW commented 1 week ago

It's the design of hash shuffle and one reason we implemented the sort shuffle.

You may decrease the two, first one is the threshold of reducer#, second one is the threshold of column#.

spark.gluten.sql.columnar.shuffle.sort.partitions.threshold
spark.gluten.sql.columnar.shuffle.sort.columns.threshold