apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
447 stars 100 forks source link

Implement Spark-compatible cast between decimals with different precision and scale #375

Open andygrove opened 2 weeks ago

andygrove commented 2 weeks ago

What is the problem the feature request solves?

Comet is not consistent with Spark when casting between decimals. Here is a test to demonstrate this.

  test("cast between decimals with different precision and scale") {
    val rowData = Seq(
      Row(BigDecimal("12345.6789")),
      Row(BigDecimal("9876.5432")),
      Row(BigDecimal("123.4567"))
    )
    val df = spark.createDataFrame(
      spark.sparkContext.parallelize(rowData),
      StructType(Seq(StructField("a", DataTypes.createDecimalType(10,4))))
    )
    castTest(df, DataTypes.createDecimalType(6,2))
  }

Spark Result

+----------+-------+
|         a|      a|
+----------+-------+
|  123.4567| 123.46|
| 9876.5432|9876.54|
|12345.6789|   null|
+----------+-------+

Comet Result

java.lang.ArithmeticException: Cannot convert 12345.68 (bytes: [B@4f834a43, integer: 1234568) to decimal with precision: 6 and scale: 2
    at org.apache.comet.vector.CometVector.getDecimal(CometVector.java:86)

Describe the potential solution

No response

Additional context

No response

viirya commented 2 weeks ago

This looks simple to fix. We currently throw an exception if cannot convert the byes to decimal, but looks like Spark returns null.

caicancai commented 1 week ago

@andygrove Do you mind letting me try it? In my spare time, I have been fixing the compatibility issues of calcite in spark sql type conversion.

viirya commented 1 week ago

@caicancai Thanks. Please go ahead to create a PR for the open tickets which no one claims working on it.

caicancai commented 5 hours ago

I am working on it.