OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 10 forks source link

Add native support for dates and datetimes #11

Closed schuemie closed 3 years ago

schuemie commented 3 years ago

RSQlite has added a new extend_types argument to the dbConnect function that adds support for DATE and DATETIME fields. Unfortunately, at this time it does not seem to work when uploading a data frame with dates. The DATE fields need to be declared explicitly in a CREATE statement.

library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), extended_types = TRUE)

myTable <- data.frame(someDate = as.Date(c("2000-01-01", "2000-02-03")),
                      someNumber = c(1,2))                           

dbWriteTable(conn, "my_table", myTable)

#someDate field is still numeric:
dbReadTable(conn, "my_table")
# someDate someNumber
# 1    10957          1
# 2    10990          2

dbExecute(conn, "CREATE TABLE my_table_2 (some_date DATE);")
dbExecute(conn, "INSERT INTO my_table_2 values ('2000-01-01'), ('2000-02-03');")

dbGetQuery(conn, 'SELECT * from my_table_2')
# some_date
# 1 2000-01-01
# 2 2000-02-03

dbDisconnect(conn)

Maybe in the future RSQLite will also recognize the date fields when using dbWriteTable()?

ablack3 commented 3 years ago

This is good news. I'll repost this issue on the RSQLite repo.

ablack3 commented 3 years ago

@schuemie - I think dates work on the develop branch now. Datetimes work as long as the timezone is UTC. If the timezone is something else then it will be converted to UTC. I'm not sure if that is a problem or not.

Will you help me create a minor release based on this update?

library(Andromeda)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
andromeda <- andromeda()

data <- data.frame(person_id = c(1, 2, 3),
                   startDate = as.Date(c("2000-01-01", "2001-01-31", "2004-12-31")),
                   startTime = as.POSIXct(c("2000-01-01 10:00:00", "2001-01-31 11:00:00", "2004-12-31 21:00:00"), tz = "UTC"),
                   startTime2 = as.POSIXct(c("2000-01-01 10:00:00", "2001-01-31 11:00:00", "2004-12-31 21:00:00"), tz = "EST"),
                   someText = c("asdf", "asdf", "asdf"))

andromeda$data <- data

data2 <- andromeda$data %>% collect()
tibble::tibble(data2)
#> # A tibble: 3 x 5
#>   person_id startDate  startTime           startTime2          someText
#>       <dbl> <date>     <dttm>              <dttm>              <chr>   
#> 1         1 2000-01-01 2000-01-01 10:00:00 2000-01-01 15:00:00 asdf    
#> 2         2 2001-01-31 2001-01-31 11:00:00 2001-01-31 16:00:00 asdf    
#> 3         3 2004-12-31 2004-12-31 21:00:00 2005-01-01 02:00:00 asdf

close(andromeda)

Created on 2021-04-12 by the reprex package (v0.3.0)

ablack3 commented 3 years ago

The fix was in RSQLite so there are no workarounds going on in Andromeda.

library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), extended_types = TRUE)

myTable <- data.frame(someDate = as.Date(c("2000-01-01", "2000-02-03")),
                      someNumber = c(1,2))                           

dbWriteTable(conn, "my_table", myTable)

tibble::tibble(dbReadTable(conn, "my_table"))
#> # A tibble: 2 x 2
#>   someDate   someNumber
#>   <date>          <dbl>
#> 1 2000-01-01          1
#> 2 2000-02-03          2
dbDisconnect(conn)

Created on 2021-04-12 by the reprex package (v0.3.0)

schuemie commented 3 years ago

Awesome! This seems like a pretty big change, so would be a major version increase? I think several HADES packages are expecting dates to be automatically converted to integers, so we should test them first.

We should specify the required version of RSQLite in the DESCRIPTION (has that version been released, or did you pull it from GitHub?)

ablack3 commented 3 years ago

I'm using the 2.2.6 version of RSQlite which is on CRAN as a source package. I'll add RSQLite (>= 2.2.6) to the DESCRIPTION.

If this is a breaking change then I guess it should be a major version. There are some other changes I'm interested in exploring that could be bundled with a 1.0 release and I've created issues for them.

These changes would support a couple new user stories for Andromeda:

If I'm going to break the package I'd like to only do it once in the near future and lay the groundwork for smaller non-breaking improvements. What do you think?

schuemie commented 3 years ago

Yes, that sounds like an excellent plan. I am a bit skeptical about whether different backends will improve performance without breaking compatibility too much, but I can always be persuaded by real data. Agreed that we want to bundle as many breaking changes into a single major release.

egillax commented 3 years ago

Hi @ablack3 I was testing this branch. When I save and load an Andromeda object the dates get converted back to integers. I assume extended_types=TRUE needs to be added to some saving/loading functions as well for date support?

ablack3 commented 3 years ago

@egillax, Thanks for testing the branch. I pushed a commit that should fix this issue and add some tests to test that dates are preserved after saving and loading. https://github.com/OHDSI/Andromeda/pull/18/commits/3a82f6804fd49c40c4348054bff2f64513b8f336

Will you try reinstalling the develop branch?

ablack3 commented 3 years ago

Native support for dates and datetimes added in v0.5.0