apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
826 stars 164 forks source link

signum(0) returns incorrect result #664

Open andygrove opened 4 months ago

andygrove commented 4 months ago

Describe the bug

SQL

SELECT c14, Signum(c14) AS x FROM test1 ORDER BY c14;

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(2) Sort [c14#214 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 0
         +- Exchange rangepartitioning(c14#214 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=26265]
            +- *(1) Project [c14#214, SIGNUM(cast(c14#214 as double)) AS x#27880]
               +- *(1) ColumnarToRow
                  +- FileScan parquet [c14#214] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c14:int>
+- == Initial Plan ==
   Sort [c14#214 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c14#214 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=26251]
      +- Project [c14#214, SIGNUM(cast(c14#214 as double)) AS x#27880]
         +- FileScan parquet [c14#214] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c14:int>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(2) Sort [c14#214 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 0
         +- Exchange rangepartitioning(c14#214 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=26330]
            +- *(1) ColumnarToRow
               +- CometProject [c14#214, x#27886], [c14#214, SIGNUM(cast(c14#214 as double)) AS x#27886]
                  +- CometScan parquet [c14#214] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c14:int>
+- == Initial Plan ==
   Sort [c14#214 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c14#214 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=26310]
      +- CometProject [c14#214, x#27886], [c14#214, SIGNUM(cast(c14#214 as double)) AS x#27886]
         +- CometScan parquet [c14#214] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test1.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c14:int>

First difference at row 102: Spark: 0,0.0 Comet: 0,1.0

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

andygrove commented 4 months ago

I filed an issue in DataFusion repo: https://github.com/apache/datafusion/issues/11557

andygrove commented 4 months ago

There is now a PR open against DataFusion to add a Postgres-compatible implementation of signum, which is very close to what we need for Spark. https://github.com/apache/datafusion/pull/11580

kazuyukitanimura commented 2 months ago

Native signum is disabled for now