Closed tslumley closed 6 years ago
I think this comes down to a decimal type underflow in the division. Here is the generated query that exhibits the problem:
dbGetQuery(my_db$con, 'SELECT MIN("x") AS "min(x)", MAX("x") AS "max(x)" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", "set", ("set" - 0.3) / 987654321.0 AS "x" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", CASE WHEN ("Species" = \'setosa\') THEN (1.0) WHEN NOT("Species" = \'setosa\') THEN (0.0) END AS "set" FROM "iris") "zxxudrfuay") "ymqjodxpbh"')
The aggregates are indeed 0. When I cast set
to DOUBLE
, the results are no longer 0
dbGetQuery(my_db$con, 'SELECT MIN("x") AS "min(x)", MAX("x") AS "max(x)" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", "set", ("set" - 0.3) / 987654321.0 AS "x" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", CAST(CASE WHEN ("Species" = \'setosa\') THEN (1.0) WHEN NOT("Species" = \'setosa\') THEN (0.0) END AS DOUBLE) AS "set" FROM "iris") "zxxudrfuay") "ymqjodxpbh"')
1 -3.0375e-10 7.0875e-10
The first variant should of course also produce the correct result. I will file a MonetDB bug for this. Unfortunately, I don't know how to trigger the cast to DOUBLE
from dplyr...
I boiled it down to
select 0.7/987654321.0
this returns 0, but should not. Again,
select cast(0.7 as double)/987654321.0
returns the correct result.
I think we should default to DOUBLE
instead of DECIMAL
for constants. I have pushed a patch. With it, the example you provided produces the following result:
# A tibble: 1 x 2
`min(x)` `max(x)`
<dbl> <dbl>
1 -0.000000000304 0.000000000709
Better?
Yes, I think that will work better. Thanks
Considering this fixed then.
The following code
gives zero for the maximum and minimum (the entire column
x
is equal to zero). Since bothset
andx
show up as double whenmy_iris
is printed,x
should be non-zero floating point numbers.I suspect this is related to something I don't have a minimal example of, where I'm getting math overflow errors in summing a (quite long) column of numbers that should also be double.