darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

dateadd() and the base R pipe #5

Closed rbcavanaugh closed 1 year ago

rbcavanaugh commented 1 year ago

Hi! I came across your package a week or two ago looking for some useful date functions for network studies. Thanks for putting this together - I'm thrilled I don't have to reinvent the wheel.

Anyways - I was stumped by the dateadd() function not working with this error:

Error in get(".", envir = parent.frame()) : object '.' not found

Eventually, realized this error is from using the base-R pipe rather than the magrittr pipe. Might be something to note in the documentation, and I wanted to make a note in case someone else runs into the same error.

Works:

df <- date_tbl %>%
    dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) %>%
    dplyr::collect()

Doesn't work:

df <- date_tbl |>
    dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) |>
    dplyr::collect()

Feel free to close with changes - Rob

ablack3 commented 1 year ago

Thanks for reporting this and I'm glad you have found the package helpful! I'll try to fix this so it works with both pipes.

rbcavanaugh commented 1 year ago

Is this line just to get information about the connection?

dot <- get(".", envir = parent.frame())
 db <- CDMConnector::dbms(dot$src$con)

I'm not sure if this approach is useful - but something we've done is for any function that requires information about the connection, setting one argument to con = getOption("con.default.value") and encouraging users to set the default.

con <- DBI::dbConnect(...)
con.default.value = con
dateadd <- function(date, number, interval = "day", con = getOption("con.default.value")) {...}

And perhaps for this package, cdm_from_con() could set the default under the hood.

ablack3 commented 1 year ago

I looked into this and have no idea how to support the native pipe using my current approach.

Yes you are correct that those lines do get the connection information which I need for translation.

The problem I see with using a global option is that there isn't just one connection. Maybe in practice there is often one connection but certainly not always.

Any ideas?

I can't even look at the function definition of |> which is odd.

library(magrittr)
1:2 |> sum()
#> [1] 3

is.function(`%>%`)
#> [1] TRUE
is.function(`|>`)
#> Error in eval(expr, envir, enclos): object '|>' not found

body(`%>%`)
#> {
#>     lhs <- substitute(lhs)
#>     rhs <- substitute(rhs)
#>     kind <- 1L
#>     env <- parent.frame()
#>     lazy <- TRUE
#>     .External2(magrittr_pipe)
#> }
body(`|>`)
#> Error in body(`|>`): object '|>' not found

Created on 2023-07-11 with reprex v2.0.2

So literally everything is a function in R except for the native pipe?

rbcavanaugh commented 1 year ago

Yeah - that's odd I wouldn't have expected that output but I haven't spent a ton of time digging into the base-R pipe. I'm happy to spend some time trying to troubleshoot. How does get(".", environment = parent.frame() work? I can't quite follow the breadcrumbs of how it gets the src object.

ablack3 commented 1 year ago

I started a discussion here https://community.rstudio.com/t/why-is-the-native-pipe-not-a-function/169764/15

get(".", environment = parent.frame()) will extract the object "." from the calling environment essentially giving me the left hand side of the magrittr pipe.

But it only works when used with the magrittr pipe so it's not a good solution. I think there needs to be a better way to extend dbplyr translations.

rbcavanaugh commented 1 year ago

Fascinating. Perhaps the simple solution is just to add a tryCatch on that line and not worry about supporting the base-R pipe for now. magrittr/dplyr is already a dependency of the package - seems like base vs. R pipe is just a stylistic choice if CDMConnector is loaded anyway right?

 dot <- tryCatch(
            {
                get(".", envir = parent.frame())
            },
            error=function(e) {
                message("Unable to find connection. You might be using the base-R pipe. dateadd must follow a magrittr pipe ")
            }
        )
ablack3 commented 1 year ago

Yea I think your suggestion is an improvement but one problem is that if you don't use a pipe at all it fails. So not great. Really need to figure out a better implementation for this.

library(CDMConnector)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbWriteTable(con, "tbl", data.frame(date = as.Date("2020-01-01")))

db <- dplyr::tbl(con, "tbl")

db %>% 
  dplyr::mutate(next_day = !!dateadd("date", 1, "day"))
#> # Source:   SQL [1 x 2]
#> # Database: DuckDB 0.8.2-dev77 [root@Darwin 21.6.0:R 4.2.2/:memory:]
#>   date       next_day  
#>   <date>     <date>    
#> 1 2020-01-01 2020-01-02

dplyr::mutate(db, next_day = !!dateadd("date", 1, "day"))
#> Error in get(".", envir = parent.frame()): object '.' not found

DBI::dbDisconnect(con, shutdown = T)

Created on 2023-07-14 with reprex v2.0.2

ablack3 commented 1 year ago

Hi @rbcavanaugh, Unfortunately I don't think I can fix this. For now it just works with the magrittr pipe. In the future though we will try to add the correct translations to dbplyr and avoid this workaround altogether. So the code would look like:

library(CDMConnector)
con <- DBI::dbConnect(duckdb::duckdb())
DBI::dbWriteTable(con, "tbl", data.frame(date_column = as.Date("2020-01-01")))

db <- dplyr::tbl(con, "tbl")

db %>% 
  dplyr::mutate(next_day = clock::add_days(date_column, 10))

https://clock.r-lib.org/reference/clock-arithmetic.html

rbcavanaugh commented 1 year ago

no worries - thanks for trying!