[VL] Columnar shuffle write byte increase as the number of partitions increases

j7nhai commented 1 week ago

Backend

VL (Velox)

Bug description

When run ssb-q4.2 with scale 100T and enable columnar shuffle writes, we found that shuffle write byte added up of all stages increase as the number of partitions increases. However, when disable gluten, the growth trend of vanilla spark is not so obvious.

spark.sql.adaptive.coalescePartitions.initialPartitionNum=k

The following table shows the shuffle write bytes sum by all stages.

	k=1000	k=2000	k=4000	k=8000	k=16000
enable gluten (with columnar shuffle)	25569016307	27297986268	29890895841	37058299593	40684916742
disable gluten	30355717816	30463886189	30595400121	31204187443	32796457022

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

FelixYBW commented 1 week ago

Did you enable sort based shuffle? Hash shuffle has this issue.

j7nhai commented 1 week ago

Did you enable sort based shuffle? Hash shuffle has this issue.

Is it a bug and will be fixed in the future? May I know the reason why the disk is rising?

I didn't see any config to force enable sort based shuffle to avoid hash shuffle. Could I just decrease the value of spark.gluten.sql.columnar.shuffle.sort.columns.threshold

FelixYBW commented 1 week ago

It's the design of hash shuffle and one reason we implemented the sort shuffle.

You may decrease the two, first one is the threshold of reducer#, second one is the threshold of column#.

spark.gluten.sql.columnar.shuffle.sort.partitions.threshold
spark.gluten.sql.columnar.shuffle.sort.columns.threshold

apache / incubator-gluten