facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.4k stars 1.11k forks source link

round() semantics is not HALF_UP in sparksql functions #3219

Open zhixingheyi-tian opened 1 year ago

zhixingheyi-tian commented 1 year ago

round() semantics is not HALF_UP in sparksql functions:

#Vanilla Spark behavior
round(1.025, 2) => 1.03

#Velox behavior
round(1.025, 2)  => 1.02

cc @mbasmanova @aditi-pandit

mbasmanova commented 1 year ago

@zhixingheyi-tian Are you using round function from PrestoSQL package? If so, consider adding a different 'round' to SparkSQL package.

zhixingheyi-tian commented 1 year ago

Currently, sparksql use prestosql::round to register.

I noticed the round function implementation here, is using std::round(). https://github.com/facebookincubator/velox/blob/6121d898dc6e61f8b063d9f7b06e64c815686476/velox/functions/prestosql/ArithmeticImpl.h#L45-L49

It should get the result of 1.03. But the output is 1.02. It is very confusing!

mbasmanova commented 1 year ago

I see the same:

velox/expression/tests/velox_expression_runner_test --sql "round(1.025, 2::integer)"
Result: ROW<_col0:DOUBLE>
1.02

In the debugger, I see 1.025 represented as 1.0249999999999999, which when multiplies by 100 and divided by 100 gives 1.02.

mbasmanova commented 1 year ago

CC: @Yuhta @majetideepak

zhixingheyi-tian commented 1 year ago

I see the same:

velox/expression/tests/velox_expression_runner_test --sql "round(1.025, 2::integer)"
Result: ROW<_col0:DOUBLE>
1.02

In the debugger, I see 1.025 represented as 1.0249999999999999, which when multiplies by 100 and divided by 100 gives 1.02.

Thanks @mbasmanova, Why 1.025 become the 1.0249999999999999?