OHDSI / Andromeda

AsynchroNous Disk-based Representation of MassivE DAta: An R package aimed at replacing ff for storing large data objects.
https://ohdsi.github.io/Andromeda/
11 stars 9 forks source link

Converting date to string in andromeda object #27

Open gowthamrao opened 2 years ago

gowthamrao commented 2 years ago

I think as.character() on date in Andromeda object - still converts date to integer and then string

oserved behavior

anAndromedaObject <- Andromeda::andromeda()
anAndromedaObject$dataFrameWithDate <- dplyr::tibble(todayDate = Sys.Date())
anAndromedaObject$dataFrameWithDate <- anAndromedaObject$dataFrameWithDate %>% 
  dplyr::mutate(todayDate = as.character(.data$todayDate))

todayDate

1 18849

expected behavior

dataFrameWithDate <- dplyr::tibble(todayDate = Sys.Date())
dataFrameWithDate <- dataFrameWithDate %>% 
  dplyr::mutate(todayDate = as.character(.data$todayDate))

todayDate

1 2021-08-10

maybe related to https://github.com/OHDSI/Andromeda/issues/11

gowthamrao commented 2 years ago

Ofcourse it can be overcome by bringing full data into R using dplyr::collect() but that would defeat the purpose of Andromeda


anAndromedaObject <- Andromeda::andromeda()
anAndromedaObject$dataFrameWithDate <- dplyr::tibble(todayDate = Sys.Date())
anAndromedaObject$dataFrameWithDate <- anAndromedaObject$dataFrameWithDate %>% 
  dplyr::collect() %>% 
  dplyr::mutate(todayDate = as.character(.data$todayDate))

todayDate

1 2021-08-10
ablack3 commented 2 years ago

Yea. All these issues with Andromeda dates are due to the lack of a date type in SQLite. RSQLite tries to make it look like SQLite has dates. I have a pretty unsatisfactory solution but I think it does convert the date to a character string in the database. (I think 2440588 is the Julian date of Jan 1, 1970)

library(Andromeda)

anAndromedaObject <- Andromeda::andromeda()
anAndromedaObject$dataFrameWithDate <- dplyr::tibble(todayDate = Sys.Date())
anAndromedaObject$dataFrameWithDate 
#> # Source:   table<dataFrameWithDate> [?? x 1]
#> # Database: sqlite 3.36.0
#>   todayDate 
#>   <date>    
#> 1 2021-10-13

anAndromedaObject$dataFrameWithDate <-
  anAndromedaObject$dataFrameWithDate %>% 
  mutate(todayDate = date(todayDate, "+2440588 days"))

anAndromedaObject$dataFrameWithDate
#> # Source:   table<dataFrameWithDate> [?? x 1]
#> # Database: sqlite 3.36.0
#>   todayDate 
#>   <chr>     
#> 1 2021-10-13

close(anAndromedaObject)

Created on 2021-10-13 by the reprex package (v2.0.1)

ablack3 commented 1 year ago

This will be addressed in the next release (arrow_S4 branch)

library(Andromeda)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
anAndromedaObject <- Andromeda::andromeda()
anAndromedaObject$dataFrameWithDate <- dplyr::tibble(todayDate = Sys.Date())
anAndromedaObject$dataFrameWithDate <- anAndromedaObject$dataFrameWithDate %>% 
  dplyr::mutate(todayDate = as.character(.data$todayDate)) 

collect(anAndromedaObject$dataFrameWithDate)
#> # A tibble: 1 × 1
#>   todayDate 
#>   <chr>     
#> 1 2022-11-20

Created on 2022-11-20 with reprex v2.0.2