apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.21k stars 437 forks source link

[VL] counterintuitive spills of join Operator #4598

Open fuuuyuuu opened 9 months ago

fuuuyuuu commented 9 months ago

Backend

VL (Velox)

Bug description

Hello, I am evaluating the performance of the gluten + velox backend on the TPC-DS 3000 benchmark and noticed that q95 performs weirdly.

While I set the spark.memory.offHeap.size to 16g, the entire execution time of q95 is 227.9 s, and there is little data that has been spilled (part of the DAG shown below)

image

When I modify spark.memory.offHeap.size to 18g and keep the other configurations unchanged, the entire execution time of q95 degrades to 375.3 s, and lots of spills have been triggered(several times higher than the 16g case.), which is counter-intuitive. In theory, allocate more memory to each task should trigger fewer spills.

when I disable the spill of join through spark.gluten.sql.columnar.backend.velox.joinSpillEnabled false, both of the execution times are the same. I am sure that the number of the executor is the same.

Any suggestions?

Spark version

Spark-3.3.x

Spark configurations

Here are the configs:

spark.master yarn spark.deploy-mode client spark.sql.cbo.enabled true

spark.executor.cores 14 spark.executor.memory 5g spark.executor.memoryOverhead 944m spark.memory.offHeap.size 18g spark.memory.offHeap.enabled true spark.gluten.enabled true spark.plugins io.glutenproject.GlutenPlugin spark.gluten.sql.columnar.backend.lib velox spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager spark.gluten.loadLibFromJar true spark.sql.shuffle.partitions 208

spark.gluten.sql.columnar.backend.velox.maxSpillFileSize 1073741824 spark.default.parallelism 167

System information

No response

Relevant logs

No response

ulysses-you commented 9 months ago

@fuuuyuuu is it possible to provide all spark configs ?

zhztheplayer commented 9 months ago

I could imagine that the extra 2g (18g-16g) made a upstream operator spill less data then join had less spare memory during hash build.

Would you also share the complete query DAG somehow?

fuuuyuuu commented 9 months ago

I could imagine that the extra 2g (18g-16g) made a upstream operator spill less data then join had less spare memory during hash build.

Would you also share the complete query DAG somehow?

Thank you for your reply. These are the whole DAGs: 16g: DAG_of_sql95_16g

18g: DAG_of_sql95_18g

fuuuyuuu commented 9 months ago

@fuuuyuuu is it possible to provide all spark configs ?

It seems that the key configs have been shown above. Are there other configs that truly matter?