facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.47k stars 1.14k forks source link

Diff on parquet filter agg #11257

Open zml1206 opened 2 hours ago

zml1206 commented 2 hours ago

Bug description

Write parquet file requires disable gluten.

spark.sql("set spark.gluten.enabled=false")
spark.range(100).selectExpr("id%2 as c1", "id%5 as c2", "id as c3").write.mode("overwrite").parquet("tmp/t1")
spark.sql("set spark.gluten.enabled=true")
spark.read.parquet("tmp/t1").createOrReplaceTempView("t1")
spark.sql("select c2, sum(c3)  from t1 where  c1= 1 group by c2").show

result

+---+---------------+
| c2|        sum(c3)|
+---+---------------+
|  0|559882429285360|
|  1|559885503421750|
|  3|839826576815406|
|  2|839827141809990|
|  4|559885785918562|
+---+---------------+

Through testing, found that https://github.com/facebookincubator/velox/pull/11010 caused, it worked after reverted it.

System information

Velox System Info v0.0.2 Commit: 288336153060b4c2ac9bd231a353f98dceb48c8a CMake Version: 3.28.3 System: Linux-5.15.0-113-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 11.4.0 C Compiler: /usr/bin/cc C Compiler Version: 11.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

zml1206 commented 2 hours ago

cc @Yuhta