apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
649 stars 119 forks source link

Fix Spark 4.0 sql tests #551

Open kazuyukitanimura opened 2 weeks ago

kazuyukitanimura commented 2 weeks ago

Describe the bug

Regarding https://github.com/apache/datafusion-comet/pull/537, there are 103 Spark 4.0 sql tests failing.

Fix comet shims for the Spark 4.0 profile and remove IgnoreComet for those tests. Some of the tests may share same root causes

sql-1 WIP PR posted Done Failing test
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery
  • - [ ]
  • - [ ]
  • - [ ]
[SPARK-43226] extra constant metadata fields with extractors
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> IntegerType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> ShortType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> LongType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> DoubleType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> DoubleType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion DateType -> TimestampNTZType
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ByteType -> DecimalType(10,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ByteType -> DecimalType(20,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> DecimalType(10,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> DecimalType(20,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> DecimalType(38,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> DecimalType(10,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> DecimalType(20,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> DecimalType(38,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion LongType -> DecimalType(20,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion LongType -> DecimalType(38,0)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ByteType -> DecimalType(11,1)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion ShortType -> DecimalType(11,1)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion IntegerType -> DecimalType(11,1)
  • - [x]
  • - [x]
  • - [ ]
parquet widening conversion LongType -> DecimalType(21,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ByteType -> DecimalType(1,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ByteType -> DecimalType(3,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ShortType -> DecimalType(3,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ShortType -> DecimalType(5,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion IntegerType -> DecimalType(5,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ByteType -> DecimalType(4,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ShortType -> DecimalType(6,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion LongType -> DecimalType(10,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ByteType -> DecimalType(2,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ShortType -> DecimalType(4,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion IntegerType -> DecimalType(9,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion LongType -> DecimalType(19,0)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ByteType -> DecimalType(3,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion ShortType -> DecimalType(5,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion IntegerType -> DecimalType(10,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet conversion LongType -> DecimalType(20,1)
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet timestamp conversion TimestampType (TIMESTAMP_MICROS) -> DateType
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet timestamp conversion TimestampType (TIMESTAMP_MILLIS) -> DateType
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet timestamp conversion TimestampNTZType (INT96) -> DateType
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet timestamp conversion TimestampNTZType (TIMESTAMP_MICROS) -> DateType
  • - [x]
  • - [x]
  • - [ ]
unsupported parquet timestamp conversion TimestampNTZType (TIMESTAMP_MILLIS) -> DateType
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(5, 2) -> Decimal(7, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(5, 2) -> Decimal(10, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(5, 2) -> Decimal(20, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(10, 2) -> Decimal(12, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(10, 2) -> Decimal(20, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(20, 2) -> Decimal(22, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(7, 2) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(10, 2) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(20, 2) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(12, 2) -> Decimal(10, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(20, 2) -> Decimal(10, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision change Decimal(22, 2) -> Decimal(20, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(7, 4)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(10, 7)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(20, 17)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(10, 2) -> Decimal(12, 4)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(10, 2) -> Decimal(20, 12)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(20, 2) -> Decimal(22, 4)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(7, 4) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(10, 7) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(20, 17) -> Decimal(5, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(12, 4) -> Decimal(10, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(20, 17) -> Decimal(10, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(22, 4) -> Decimal(20, 2)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(10, 6) -> Decimal(12, 4)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(20, 7) -> Decimal(22, 5)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(12, 4) -> Decimal(10, 6)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(22, 5) -> Decimal(20, 7)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(5, 2) -> Decimal(6, 4)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(10, 4) -> Decimal(12, 7)
  • - [x]
  • - [x]
  • - [ ]
parquet decimal precision and scale change Decimal(20, 5) -> Decimal(22, 8)
  • - [ ]
  • - [ ]
  • - [ ]
parquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with parquet-mr
  • - [ ]
  • - [ ]
  • - [ ]
partition pruning in broadcast hash joins with aliases
  • - [ ]
  • - [ ]
  • - [ ]
partition pruning in broadcast hash joins
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-32817: DPP throws error when the broadcast side is empty
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-36444: Remove OptimizeSubqueries from batch of PartitionPruning
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-38674: Remove useless deduplicate in SubqueryBroadcastExec
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39338: Remove dynamic pruning subquery if pruningKey's references is empty
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39217: Makes DPP support the pruning side has Union
  • - [ ]
  • - [ ]
  • - [ ]
partition pruning in broadcast hash joins with aliases
  • - [ ]
  • - [ ]
  • - [ ]
partition pruning in broadcast hash joins
  • - [ ]
  • - [ ]
  • - [ ]
different broadcast subqueries with identical children
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-32817: DPP throws error when the broadcast side is empty
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-36444: Remove OptimizeSubqueries from batch of PartitionPruning
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-38674: Remove useless deduplicate in SubqueryBroadcastExec
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39338: Remove dynamic pruning subquery if pruningKey's references is empty
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39217: Makes DPP support the pruning side has Union
  • - [ ]
  • - [ ]
  • - [ ]
join with ordering requirement
sql-2 WIP PR posted Done Failing test
  • - [ ]
  • - [ ]
  • - [ ]
collations.sql
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39166: Query context of binary arithmetic should be serialized to executors when WSCG is off
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39175: Query context of Cast should be serialized to executors when WSCG is off
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-39190,SPARK-39208,SPARK-39210: Query context of decimal overflow error should be serialized to executors when WSCG is off
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-40389: Don't eliminate a cast which can cause overflow
  • - [ ]
  • - [ ]
  • - [ ]
postgreSQL/float8.sql
  • - [ ]
  • - [ ]
  • - [ ]
postgreSQL/groupingsets.sql
  • - [ ]
  • - [ ]
  • - [ ]
postgreSQL/int4.sql
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-47120: subquery literal filter pushdown
  • - [ ]
  • - [ ]
  • - [ ]
SPARK-47120: subquery literal filter pushdown
  • - [ ]
  • - [ ]
  • - [ ]
view-schema-binding-config.sql
  • - [ ]
  • - [ ]
  • - [ ]
view-schema-compensation.sql

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

kazuyukitanimura commented 2 weeks ago

Search https://github.com/apache/datafusion-comet/issues/551 in dev/diffs/4.0.0-preview1.diff to find ignored tests

viirya commented 2 weeks ago

Hmm, this failed tests are additional to Spark 3.4? I.e., they are passed in Spark 3.4 + Comet but fail in Spark 4.0?

kazuyukitanimura commented 2 weeks ago

Hmm, this failed tests are additional to Spark 3.4? I.e., they are passed in Spark 3.4 + Comet but fail in Spark 4.0?

Could be both additional and regression. Could be due to ANSI. This ticket is for getting help from the community after https://github.com/apache/datafusion-comet/pull/537 is merged