Open comphead opened 1 week ago
Surprising that Postgres fails. According to the docs, Infinity, -Infinity, and NaN
are legitimate floating point values. This is also the expected behavior per the ANSI SQL standard, afaik.
NULL
is not the right thing to return here.
Postgres fails:
andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ psql -h localhost -U postgres
psql (14.12 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
Some psql features might not work.
Type "help" for help.
postgres=# select 1 % -0.0;
ERROR: division by zero
I don't have a strong preference one way or the other, personally
I don't have a strong preference one way or the other, personally
I would go with Spark/DuckDB way :) if there is no other strong opinions, I'll make a fix soon
ERROR: division by zero
I wasn't doubting it. :) But I am surprised.
@comphead I'll take back my 'NULL is not the right thing to return here' comment. It's better than failing for most users.
Just one thing to keep in mind - Spark sql has a isnan
function which should be consistent with this.
Thanks @parthchandra for the suggestion, I just checked we should be fine.
scala> spark.sql("select 1 % -0.0").show(false)
+---------+
|(1 % 0.0)|
+---------+
|NULL |
+---------+
scala> spark.sql("select isnan(1 % -0.0)").show(false)
+----------------+
|isnan((1 % 0.0))|
+----------------+
|false |
+----------------+
arrow-rs follows the rules of IEEE 754. If we intend to be compatible with other engines, many cases will also need to be modified, such as division by zero.
DataFusion CLI v39.0.0
> select 1.0/0.0;
+-------------------------+
| Float64(1) / Float64(0) |
+-------------------------+
| inf |
+-------------------------+
> select 0.0/0.0;
+-------------------------+
| Float64(0) / Float64(0) |
+-------------------------+
| NaN |
+-------------------------+
In PostgreSQL:
postgres=# select 1.0/0.0;
ERROR: division by zero
postgres=# select 0.0/0.0;
ERROR: division by zero
This is a good point @jonahgao so this gives me even more confidence to introduce a new DF param which controls should DF go with IEEE 754
Describe the bug
DF returns NaN in modulo query below and is not compliant with major engines:
PG, Trino fails on such query, DuckDB, Spark returns NULL
@alamb @viirya @andygrove what would be expected behavior from DataFusion? We can also introduce a config to be PG compliant or PG/Trino compliant?
To Reproduce
No response
Expected behavior
No response
Additional context
No response