apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 440 forks source link

[VL] result mismatch found in round #6827

Open jiangjiangtian opened 3 months ago

jiangjiangtian commented 3 months ago

Backend

VL (Velox)

Bug description

SQL:

SELECT round(cast(0.5549999999999999 as double), 2);

Gluten returns 0.56, but vanilla spark returns 0.55. The reason of this mismatch is that the result of std::nextafter(0.5549999999999999) is 0.55500000000000005, which make std::round return 56. Besides, the following SQL also can't produce the result result in gluten:

SELECT round(cast(0.19324999999999998 as double), 2);

Gluten returns 0.1933, spark returns 0.1922.

I have tried the following modification in round:

return static_cast<TNum>(std::round(std::nextafter(static_cast<long double>(number), kInf) * factor) / factor);

I can fix the first example, but the second example still has a mismatch. The modification will cause other mismatch:

SELECT round(cast(0.575 as double), 2);

Before the modification, gluten returns the right result, which is 0.58. After the modification, glutens returns 0.57.

Spark version

3.0

Spark configurations

No response

System information

No response

Relevant logs

No response

jiangjiangtian commented 3 months ago

CC @rui-mo @ArnavBalyan @kecookier

ArnavBalyan commented 3 months ago

Yes the round function gets data in double format, double is used throughout upstream. We might have to refactor codebase and upstream functions to move to something which can handle the high precision thanks

jiangjiangtian commented 3 months ago

Yes the round function gets data in double format, double is used throughout upstream. We might have to refactor codebase and upstream functions to move to something which can handle the high precision thanks

Thanks! I think we can use DecimalRoundFunction to work around, but it also needs a lot of effort.

rui-mo commented 3 months ago

Hi @jiangjiangtian, is this issue noticed in a runtime workload? How do you evaluate the importance of this issue to your workload? Thanks.

jiangjiangtian commented 3 months ago

Hi @jiangjiangtian, is this issue noticed in a runtime workload? How do you evaluate the importance of this issue to your workload? Thanks.

I find the issue in a test workload. We have some remaining problems, one of which includes this question. I try to fix this issue using DecimalRoundFunction. Do you think it is a possible solution?

rui-mo commented 3 months ago

Hi @jiangjiangtian, here are my thoughts. If we cast double as decimal before rounding, how to make sure we obtain the expected decimal? I believe the problem with the number 0.55499999999999999 is that we do not have an exact representation for it, thus we can not tell if it is simply a representation of 0.555 or if it should actually be 0.55499999999999999 as you indicated. This problem, I believe, also arises when casting a double as a decimal, which could potentially produce an unexpected result.

To solve this issue ultimately, I agree with @ArnavBalyan, we need to map Spark double type to a higher-precision type in C++ like boost::multiprecision::cpp_dec_float_50 but this will take a lot of work.

jiangjiangtian commented 3 months ago

Hi @jiangjiangtian, here are my thoughts. If we cast double as decimal before rounding, how to make sure we obtain the expected decimal? I believe the problem with the number 0.55499999999999999 is that we do not have an exact representation for it, thus we can not tell if it is simply a representation of 0.555 or if it should actually be 0.55499999999999999 as you indicated. This problem, I believe, also arises when casting a double as a decimal, which could potentially produce an unexpected result.

To solve this issue ultimately, I agree with @ArnavBalyan, we need to map Spark double type to a higher-precision type in C++ like boost::multiprecision::cpp_dec_float_50 but this will take a lot of work.

Thanks for your reply! I think you are right.

zhztheplayer commented 3 months ago

@jiangjiangtian Would you like to open a PR to add a simple UT for the issue? Could mark it as ignored before we actually fix it.

rui-mo commented 3 months ago

Given that Spark uses BigDecimal(d).setScale(_scale, mode).toDouble when rounding a double number, I wonder whether we could apply a similar strategy to the round function to achieve the same result as Spark.

Although the constructor BigDecimal(double val, MathContext mc) claims the result is unpredictable due to the imprecision of double, if we follow its implementation, it's possible the results for Spark and Gluten could become the same.

https://github.com/openjdk/jdk/blob/jdk-24%2B11/src/java.base/share/classes/java/math/BigDecimal.java#L991-L1087

https://github.com/openjdk/jdk/blob/jdk-24%2B11/src/java.base/share/classes/java/math/BigDecimal.java#L2900-L2985

@ArnavBalyan @jiangjiangtian Please help share your thoughts. Thanks.

jiangjiangtian commented 3 months ago

Given that Spark uses BigDecimal(d).setScale(_scale, mode).toDouble when rounding a double number, I wonder whether we could apply a similar strategy to the round function to achieve the same result as Spark.

Although the constructor BigDecimal(double val, MathContext mc) claims the result is unpredictable due to the imprecision of double, if we follow its implementation, it's possible the results for Spark and Gluten could become the same.

https://github.com/openjdk/jdk/blob/jdk-24%2B11/src/java.base/share/classes/java/math/BigDecimal.java#L991-L1087

https://github.com/openjdk/jdk/blob/jdk-24%2B11/src/java.base/share/classes/java/math/BigDecimal.java#L2900-L2985

@ArnavBalyan @jiangjiangtian Please help share your thoughts. Thanks.

@rui-mo Thanks for your investigation. I think it is a possible solution. If we made this, we can follow Spark's implementation: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1532-L1595

rui-mo commented 3 months ago

I did some tests on the Java BigDecimal setScale API but found it could also give unexpected results, e.g., round(0.575, 2) is 0.57. Link for the test code: https://onlinegdb.com/DJIsuTIbc

wForget commented 1 month ago

Add a mismatch case:

select (199 / 426), round((199 / 426), 8), round(0.4671361502347418, 8)

gluten 1.2.0 with velox:

+---------------------+------------------------+-------------------------------+--+
|     (199 / 426)     | round((199 / 426), 8)  | round(0.4671361502347418, 8)  |
+---------------------+------------------------+-------------------------------+--+
| 0.4671361502347418  | 0.46713614999999997    | 0.46713615                    |
+---------------------+------------------------+-------------------------------+--+

vanilla spark:

+---------------------+------------------------+-------------------------------+--+
|     (199 / 426)     | round((199 / 426), 8)  | round(0.4671361502347418, 8)  |
+---------------------+------------------------+-------------------------------+--+
| 0.4671361502347418  | 0.46713615             | 0.46713615                    |
+---------------------+------------------------+-------------------------------+--+