duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
131 stars 23 forks source link

Sub-day precision Date class must be converted to integer by floor, not trunc #517

Open eitsupi opened 1 month ago

eitsupi commented 1 month ago

Same as apache/arrow#44502 and apache/arrow-nanoarrow#665

Conversion of dates prior to the Posix origin is not correct.

(d <- as.Date(c(-1.1, -0.1, 0, 0.1, 1.1)))
#> [1] "1969-12-30" "1969-12-31" "1970-01-01" "1970-01-01" "1970-01-02"
df <- data.frame(d = d)
con <- DBI::dbConnect(duckdb::duckdb())
duckdb::duckdb_register(con, "df", df)
duckdb:::sql("from df", con)$d
#> [1] "1969-12-31" "1970-01-01" "1970-01-01" "1970-01-01" "1970-01-02"

Created on 2024-10-23 with reprex v2.1.1

With duckplyr:

(df <- tibble::tibble(d = as.Date(c(-1.1, -0.1, 0, 0.1, 1.1))))
#> # A tibble: 5 × 1
#>   d
#>   <date>
#> 1 1969-12-30
#> 2 1969-12-31
#> 3 1970-01-01
#> 4 1970-01-01
#> 5 1970-01-02

df |>
  duckplyr::as_duckplyr_tibble() |>
  dplyr::select(d)
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [d as d]
#>   r_dataframe_scan(0x56267c113fd8)
#>
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - d (DATE)
#>
#> # A tibble: 5 × 1
#>   d
#>   <date>
#> 1 1969-12-31
#> 2 1970-01-01
#> 3 1970-01-01
#> 4 1970-01-01
#> 5 1970-01-02

Created on 2024-10-23 with reprex v2.1.1

krlmlr commented 4 weeks ago

Thanks, confirmed. What behavior would you expect with such dates, given that they can't be properly represented in duckdb (or so it seems)?

df1 <- tibble::tibble(
  delta = c(-1.1, -0.1, 0, 0.1, 1.1),
  d = as.Date(c(-1.1, -0.1, 0, 0.1, 1.1), origin = "1970-01-01"),
  dd = d + 0.5,
)
df1
#> # A tibble: 5 × 3
#>   delta d          dd        
#>   <dbl> <date>     <date>    
#> 1  -1.1 1969-12-30 1969-12-31
#> 2  -0.1 1969-12-31 1970-01-01
#> 3   0   1970-01-01 1970-01-01
#> 4   0.1 1970-01-01 1970-01-01
#> 5   1.1 1970-01-02 1970-01-02

duckdb <- asNamespace("duckdb")
drv <- duckdb::duckdb()
con <- DBI::dbConnect(drv)
experimental <- FALSE

"select"
#> [1] "select"
rel1 <- duckdb$rel_from_df(con, df1, experimental = experimental)
"select"
#> [1] "select"
rel2 <- duckdb$rel_project(
  rel1,
  list(
    {
      tmp_expr <- duckdb$expr_reference("delta")
      duckdb$expr_set_alias(tmp_expr, "delta")
      tmp_expr
    },
    {
      tmp_expr <- duckdb$expr_reference("d")
      duckdb$expr_set_alias(tmp_expr, "d")
      tmp_expr
    },
    {
      tmp_expr <- duckdb$expr_reference("dd")
      duckdb$expr_set_alias(tmp_expr, "dd")
      tmp_expr
    }
  )
)
rel2
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [delta as delta, d as d, dd as dd]
#>   r_dataframe_scan(0x1145c1888)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - delta (DOUBLE)
#> - d (DATE)
#> - dd (DATE)
duckdb$rel_to_altrep(rel2)
#>   delta          d         dd
#> 1  -1.1 1969-12-31 1970-01-01
#> 2  -0.1 1970-01-01 1970-01-01
#> 3   0.0 1970-01-01 1970-01-01
#> 4   0.1 1970-01-01 1970-01-01
#> 5   1.1 1970-01-02 1970-01-02

Created on 2024-10-28 with reprex v2.1.1

eitsupi commented 4 weeks ago

What behavior would you expect with such dates, given that they can't be properly represented in duckdb (or so it seems)?

I'm sorry, but I don't understand your question. I was just pointing out the fact that it is unreasonable for Dates based on a double type to be trunced. For a simple double, it makes sense to switch before and after 0 because 0 has meaning, but in the case of Date, 1970-01-01 has no meaning, so I don't think the operation should switch before or after it.

krlmlr commented 4 weeks ago

I see that flooring is the more correct operation here. Would you like to contribute? I'm releasing 1.1.2 very soon, but 1.1.3 is imminent.

eitsupi commented 4 weeks ago

Sorry, but I have no experience with C++ and don't know where to fix it.