Open fuuuyuuu opened 9 months ago
@fuuuyuuu is it possible to provide all spark configs ?
I could imagine that the extra 2g (18g-16g) made a upstream operator spill less data then join had less spare memory during hash build.
Would you also share the complete query DAG somehow?
I could imagine that the extra 2g (18g-16g) made a upstream operator spill less data then join had less spare memory during hash build.
Would you also share the complete query DAG somehow?
Thank you for your reply. These are the whole DAGs: 16g:
18g:
@fuuuyuuu is it possible to provide all spark configs ?
It seems that the key configs have been shown above. Are there other configs that truly matter?
Backend
VL (Velox)
Bug description
Hello, I am evaluating the performance of the gluten + velox backend on the TPC-DS 3000 benchmark and noticed that q95 performs weirdly.
While I set the spark.memory.offHeap.size to 16g, the entire execution time of q95 is 227.9 s, and there is little data that has been spilled (part of the DAG shown below)
When I modify spark.memory.offHeap.size to 18g and keep the other configurations unchanged, the entire execution time of q95 degrades to 375.3 s, and lots of spills have been triggered(several times higher than the 16g case.), which is counter-intuitive. In theory, allocate more memory to each task should trigger fewer spills.
when I disable the spill of join through
spark.gluten.sql.columnar.backend.velox.joinSpillEnabled false
, both of the execution times are the same. I am sure that the number of the executor is the same.Any suggestions?
Spark version
Spark-3.3.x
Spark configurations
Here are the configs:
spark.master yarn spark.deploy-mode client spark.sql.cbo.enabled true
spark.executor.cores 14 spark.executor.memory 5g spark.executor.memoryOverhead 944m spark.memory.offHeap.size 18g spark.memory.offHeap.enabled true spark.gluten.enabled true spark.plugins io.glutenproject.GlutenPlugin spark.gluten.sql.columnar.backend.lib velox spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager spark.gluten.loadLibFromJar true spark.sql.shuffle.partitions 208
spark.gluten.sql.columnar.backend.velox.maxSpillFileSize 1073741824 spark.default.parallelism 167
System information
No response
Relevant logs
No response