DyfanJones / RAthena

Connect R to Athena using Boto3 SDK (DBI Interface)
https://dyfanjones.github.io/RAthena/
Other
35 stars 6 forks source link

Maybe wday() should raise a warning #151

Closed brunocarlin closed 2 years ago

brunocarlin commented 3 years ago

I think the behaviour of adding + 1 to the function in Athena could raise some warnings even though it matches the R output it is quite strange to debug locally

Thanks for the amazing pkg

DyfanJones commented 3 years ago

hi @brunocarlin can you provide some code so I can replicate the issue :)

brunocarlin commented 2 years ago

Sure

library(RAthena)
library(tidyverse)
library(reticulate)
use_condaenv("RAthena",required = TRUE)

con <- dbConnect(RAthena::athena(),
                 aws_access_key_id= Sys.getenv("AWS_ACCESS_KEY_ID"),
                 aws_secret_access_key=Sys.getenv("AWS_SECRET_ACCESS_KEY"),
                 s3_staging_dir= Sys.getenv("AWS_STAGING_DIR"),
                 region_name = Sys.getenv("AWS_DEFAULT_REGION"),
                 work_group  = Sys.getenv("AWS_WORK_GROUP")
)

table_exaple <- tbl(con,sql(r"(select date '2021-10-06' as date_example)"))

#  today is wednesday for Athena this is equal to 3 but if we run it in r it is 4

query_date <- table_exaple %>% 
  mutate(wday_result_wday = wday(date_example),
         wday_result_dow = DOW(date_example)
         )

query_date
#> Info: (Data scanned: 0 Bytes)
#> # Source:   lazy query [?? x 3]
#> # Database: Athena 1.18.19 [us-east-1/default]
#>   date_example wday_result_wday wday_result_dow
#>   <date>                <int64>         <int64>
#> 1 2021-10-06                  4               3

# we can see that r adds 1 to the dow function which is hard to notice at first glance
query_date %>% 
  show_query()
#> <SQL>
#> SELECT "date_example", (dow("date_example") + 0) %7 + 1 AS "wday_result_wday", DOW("date_example") AS "wday_result_dow"
#> FROM (select date '2021-10-06' as date_example) "q01"

# The whole point is that the functions wday and dow are extremely similar, but not equal so maybe just like when you use mean there should be a warning

lubridate::ymd('2021-10-06') %>%  lubridate::wday()
#> [1] 4

Created on 2021-10-06 by the reprex package (v2.0.1)

DyfanJones commented 2 years ago

@brunocarlin Ah see what you mean. lubridate use start week 7 (Sunday) and AWS Athena uses a start week 1 (Monday). To replicate lubridate's start week I use data.table's approach, however the parameter start_week can be used so that AWS Athena's DOW would be equivalent.

library(DBI)
library(tidyverse)

con <- dbConnect(RAthena::athena())

table_example <- tbl(con,sql(r"(select date '2021-10-06' as date_example)"))
#> Info: (Data scanned: 0 Bytes)

#> today is wednesday for Athena this is equal to 3 but if we run it in r it is 4
table_example %>% 
  mutate(wday_7 = wday(date_example),
         wday_1 = wday(date_example, week_start = 1),
         wday_result_dow = DOW(date_example))
#> Info: (Data scanned: 0 Bytes)
#> # Source:   lazy query [?? x 4]
#> # Database: Athena 1.18.21 [eu-west-1/default]
#>   date_example  wday_7  wday_1 wday_result_dow
#>   <date>       <int64> <int64>         <int64>
#> 1 2021-10-06         4       3               3

Created on 2021-10-07 by the reprex package (v2.0.1)

I am a little reliculant to raise any warning message as dbplyr has a similar approach for their RPostgres extension: https://github.com/tidyverse/dbplyr/blob/1b7c77ef921925d7f9875200174c7f9bb28fc503/R/backend-postgres.R#L132-L144 The only differences is that AWS Athena (Presto) offers a more elegant date function 😄

brunocarlin commented 2 years ago

Fair enough, may I was just lazy as I got bitten by making some manual checks on the return value of the DOW function inside Athena and never bothered to double check the query that was being created, I will close this thread, thanks for the quick reply