MonetDB / MonetDBLite-R

MonetDB reconfigured as an R package - See below for an introduction. Edit
64 stars 13 forks source link

Weird arithmetic (types?) #15

Closed tslumley closed 6 years ago

tslumley commented 6 years ago

The following code

library(dplyr)
dbdir <- file.path(tempdir(), "dplyrdir")
my_db <- MonetDBLite::src_monetdblite(dbdir)
my_iris  <- copy_to(my_db, iris)
my_iris <- mutate(my_iris, set=ifelse(Species=="setosa",1,0))
my_iris %>% mutate(x= (set-.3)/987654321) %>% summarise(min(x),max(x)) %>% collect()

gives zero for the maximum and minimum (the entire column x is equal to zero). Since both set and x show up as double when my_iris is printed, x should be non-zero floating point numbers.

I suspect this is related to something I don't have a minimal example of, where I'm getting math overflow errors in summing a (quite long) column of numbers that should also be double.

hannes commented 6 years ago

I think this comes down to a decimal type underflow in the division. Here is the generated query that exhibits the problem:

dbGetQuery(my_db$con, 'SELECT MIN("x") AS "min(x)", MAX("x") AS "max(x)" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", "set", ("set" - 0.3) / 987654321.0 AS "x" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", CASE WHEN ("Species" = \'setosa\') THEN (1.0) WHEN NOT("Species" = \'setosa\') THEN (0.0) END AS "set" FROM "iris") "zxxudrfuay") "ymqjodxpbh"')

The aggregates are indeed 0. When I cast set to DOUBLE, the results are no longer 0

dbGetQuery(my_db$con, 'SELECT MIN("x") AS "min(x)", MAX("x") AS "max(x)" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", "set", ("set" - 0.3) / 987654321.0 AS "x" FROM (SELECT "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", CAST(CASE WHEN ("Species" = \'setosa\') THEN (1.0) WHEN NOT("Species" = \'setosa\') THEN (0.0) END AS DOUBLE) AS "set" FROM "iris") "zxxudrfuay") "ymqjodxpbh"')

1 -3.0375e-10 7.0875e-10

The first variant should of course also produce the correct result. I will file a MonetDB bug for this. Unfortunately, I don't know how to trigger the cast to DOUBLE from dplyr...

hannes commented 6 years ago

I boiled it down to

select 0.7/987654321.0

this returns 0, but should not. Again,

select cast(0.7 as double)/987654321.0

returns the correct result.

hannes commented 6 years ago

Filed as https://www.monetdb.org/bugzilla/show_bug.cgi?id=6531

hannes commented 6 years ago

I think we should default to DOUBLE instead of DECIMAL for constants. I have pushed a patch. With it, the example you provided produces the following result:

# A tibble: 1 x 2
         `min(x)`       `max(x)`
            <dbl>          <dbl>
1 -0.000000000304 0.000000000709

Better?

tslumley commented 6 years ago

Yes, I think that will work better. Thanks

hannes commented 6 years ago

Considering this fixed then.