apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 438 forks source link

[VL] TPC-H Q15 outputs unstable query results #3561

Open wanweiqiangintel opened 1 year ago

wanweiqiangintel commented 1 year ago

Backend

VL (Velox)

Bug description

[Expected behavior] and [actual behavior]. When I use velox as the backend execution engine to test TPC-H, the output result is unstable. Sometimes it prints one result but sometimes the result is empty like below: image

Spark version

Spark-3.2.x

Spark configurations

cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \ --master yarn --deploy-mode client \ --conf spark.plugins=io.glutenproject.GlutenPlugin \ --conf spark.gluten.sql.columnar.backend.lib=velox \ --conf spark.driver.extraClassPath=${GLUTEN_JAR} \ --conf spark.executor.extraClassPath=${GLUTEN_JAR} \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=20g\ --conf spark.gluten.sql.columnar.forceShuffledHashJoin=true \ --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \ --num-executors 8 \ --executor-cores 4 \ --driver-memory 10g \ --executor-memory 5g \ --conf spark.executor.memoryOverhead=5g \ --conf spark.driver.maxResultSize=2g \ --conf spark.sql.files.maxPartitionBytes=2g \ --conf spark.sql.shuffle.partitions=192

System information

Velox System Info v0.0.2 Commit: 0fd70fff3d79c643058f826a7ab83cdf9c141098 CMake Version: 3.27.2 System: Linux-5.15.0-spr.bkc.pc.16.4.24.x86_64 Arch: x86_64 C++ Compiler: /opt/rh/gcc-toolset-11/root/usr/bin/c++ C++ Compiler Version: 11.2.1 C Compiler: /opt/rh/gcc-toolset-11/root/usr/bin/cc C Compiler Version: 11.2.1 CMake Prefix Path: /usr/local;/usr;/;/home/weiqiang/.local/lib/python3.8/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

zhouyuan commented 1 year ago

Hi @wanweiqiangintel, TPCH Q15 is not stable, due to some double precision issue, have you also tried vanilla Spark? in my test it will run into the same issue.

-yuan