apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

cast negative zero to string inconsistent with Spark #1036

Open andygrove opened 3 weeks ago

andygrove commented 3 weeks ago

Describe the bug

SQL

SELECT c8, length(c8) AS x FROM test0 ORDER BY c8;

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(2) Sort [c8#8 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 0
         +- Exchange rangepartitioning(c8#8 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=7298]
            +- *(1) Project [c8#8, length(cast(c8#8 as string)) AS x#9969]
               +- *(1) ColumnarToRow
                  +- FileScan parquet [c8#8] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c8:float>
+- == Initial Plan ==
   Sort [c8#8 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c8#8 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=7284]
      +- Project [c8#8, length(cast(c8#8 as string)) AS x#9969]
         +- FileScan parquet [c8#8] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c8:float>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(1) ColumnarToRow
   +- CometSort [c8#8, x#9975], [c8#8 ASC NULLS FIRST]
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 0
            +- CometColumnarExchange rangepartitioning(c8#8 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=7361]
               +- CometProject [c8#8, x#9975], [c8#8, length(cast(c8#8 as string)) AS x#9975]
                  +- CometScan parquet [c8#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c8:float>
+- == Initial Plan ==
   CometSort [c8#8, x#9975], [c8#8 ASC NULLS FIRST]
   +- CometColumnarExchange rangepartitioning(c8#8 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=7348]
      +- CometProject [c8#8, x#9975], [c8#8, length(cast(c8#8 as string)) AS x#9975]
         +- CometScan parquet [c8#8] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c8:float>

First difference at row 33: Spark: 0.0,3 Comet: -0.0,4

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

andygrove commented 3 weeks ago

This seems like a low priority edge case