apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
646 stars 119 forks source link

1TB TPCDS benchmark over Vanilla Spark, suffer performance slowdown #588

Open DamonZhao-sfu opened 1 week ago

DamonZhao-sfu commented 1 week ago

What is the problem the feature request solves?

I'm running the 1TB TPCDS benchmark over Comet and Vanilla Spark. I'm running on a 48Core 186G RAM machine Here's my config:

/localhdd/hza214/spark-3.4/spark-3.4.2-bin-hadoop3/bin/spark-shell \
    --jars $COMET_JAR \
    --conf spark.driver.extraClassPath=$COMET_JAR \
    --conf spark.executor.extraClassPath=$COMET_JAR \
    --conf spark.comet.batchSize=8192 \
    --conf spark.sql.autoBroadcastJoinThreshold=-1 \
    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
    --conf spark.comet.enabled=true \
    --conf spark.comet.exec.enabled=true \
    --conf spark.comet.exec.all.enabled=true \
    --conf spark.comet.cast.allowIncompatible=true \
    --conf spark.comet.explainFallback.enabled=true\
    --conf spark.comet.parquet.io.enabled=false \
    --conf spark.comet.batchSize=8192 \
    --conf spark.memory.offHeap.enabled=true \
    --conf spark.memory.offHeap.size=50g \
    --conf spark.shuffle.file.buffer=128k\
    --conf spark.local.dir=/mnt/smartssd_0n/hza214/sparktmp \
    --num-executors 48 \
    --executor-cores 48 \
    --driver-memory 20g \
    --executor-memory 140g \

image

Has the open-source community run TPCDS benchmark before on Comet? I think the project is still in an relatively early stage, so many native operators are not supported yet. So does it meet the expectation?

Describe the potential solution

No response

Additional context

No response

viirya commented 1 week ago

Please follow the benchmark guide https://datafusion.apache.org/comet/contributor-guide/benchmarking.html to set up proper Comet configs. With your configs, most native operators wouldn't be enabled.

viirya commented 1 week ago

@andygrove created a repo https://github.com/apache/datafusion-benchmarks including scripts used to benchmark Comet. You can also follow the steps.

andygrove commented 1 week ago

Hi @DamonZhao-sfu we are working on some items in the 0.1.0 milestone that will likely help, particularly https://github.com/apache/datafusion-comet/pull/591 and https://github.com/apache/datafusion-comet/issues/387

I am not personally planning on spending much time on TPC-DS until we have some of these issues resolved.

DamonZhao-sfu commented 1 week ago

Hi @DamonZhao-sfu we are working on some items in the 0.1.0 milestone that will likely help, particularly #591 and #387

I am not personally planning on spending much time on TPC-DS until we have some of these issues resolved.

Thank you for your reply. I will try when these features are merged. When I follow https://datafusion.apache.org/comet/contributor-guide/benchmarking.html the benchmark guide and set the same configuration, I still discover many aggr/join operators are not supported natively, and they run much slower than vanilla spark. I'm currently writing a paper on benchmarking different spark native engine, it seems that the comet community is currently focusing on TPCH optimization? Will more native operator in TPCDS be supported in the future?

andygrove commented 6 days ago

Thank you for your reply. I will try when these features are merged. When I follow https://datafusion.apache.org/comet/contributor-guide/benchmarking.html the benchmark guide and set the same configuration, I still discover many aggr/join operators are not supported natively, and they run much slower than vanilla spark. I'm currently writing a paper on benchmarking different spark native engine, it seems that the comet community is currently focusing on TPCH optimization? Will more native operator in TPCDS be supported in the future?

Yes, we are aiming for full TPC-DS support. We are just starting with TPC-H because it is easier for contributors to get up and running with that benchmark and is good enough to highlight some current limitations.