Closed brunocarlin closed 2 years ago
hi @brunocarlin can you provide some code so I can replicate the issue :)
Sure
library(RAthena)
library(tidyverse)
library(reticulate)
use_condaenv("RAthena",required = TRUE)
con <- dbConnect(RAthena::athena(),
aws_access_key_id= Sys.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=Sys.getenv("AWS_SECRET_ACCESS_KEY"),
s3_staging_dir= Sys.getenv("AWS_STAGING_DIR"),
region_name = Sys.getenv("AWS_DEFAULT_REGION"),
work_group = Sys.getenv("AWS_WORK_GROUP")
)
table_exaple <- tbl(con,sql(r"(select date '2021-10-06' as date_example)"))
# today is wednesday for Athena this is equal to 3 but if we run it in r it is 4
query_date <- table_exaple %>%
mutate(wday_result_wday = wday(date_example),
wday_result_dow = DOW(date_example)
)
query_date
#> Info: (Data scanned: 0 Bytes)
#> # Source: lazy query [?? x 3]
#> # Database: Athena 1.18.19 [us-east-1/default]
#> date_example wday_result_wday wday_result_dow
#> <date> <int64> <int64>
#> 1 2021-10-06 4 3
# we can see that r adds 1 to the dow function which is hard to notice at first glance
query_date %>%
show_query()
#> <SQL>
#> SELECT "date_example", (dow("date_example") + 0) %7 + 1 AS "wday_result_wday", DOW("date_example") AS "wday_result_dow"
#> FROM (select date '2021-10-06' as date_example) "q01"
# The whole point is that the functions wday and dow are extremely similar, but not equal so maybe just like when you use mean there should be a warning
lubridate::ymd('2021-10-06') %>% lubridate::wday()
#> [1] 4
Created on 2021-10-06 by the reprex package (v2.0.1)
@brunocarlin Ah see what you mean. lubridate
use start week 7 (Sunday) and AWS Athena uses a start week 1 (Monday). To replicate lubridate
's start week I use data.table
's approach, however the parameter start_week
can be used so that AWS Athena's DOW would be equivalent.
library(DBI)
library(tidyverse)
con <- dbConnect(RAthena::athena())
table_example <- tbl(con,sql(r"(select date '2021-10-06' as date_example)"))
#> Info: (Data scanned: 0 Bytes)
#> today is wednesday for Athena this is equal to 3 but if we run it in r it is 4
table_example %>%
mutate(wday_7 = wday(date_example),
wday_1 = wday(date_example, week_start = 1),
wday_result_dow = DOW(date_example))
#> Info: (Data scanned: 0 Bytes)
#> # Source: lazy query [?? x 4]
#> # Database: Athena 1.18.21 [eu-west-1/default]
#> date_example wday_7 wday_1 wday_result_dow
#> <date> <int64> <int64> <int64>
#> 1 2021-10-06 4 3 3
Created on 2021-10-07 by the reprex package (v2.0.1)
I am a little reliculant to raise any warning message as dbplyr
has a similar approach for their RPostgres
extension:
https://github.com/tidyverse/dbplyr/blob/1b7c77ef921925d7f9875200174c7f9bb28fc503/R/backend-postgres.R#L132-L144
The only differences is that AWS Athena (Presto) offers a more elegant date function 😄
Fair enough, may I was just lazy as I got bitten by making some manual checks on the return value of the DOW function inside Athena and never bothered to double check the query that was being created, I will close this thread, thanks for the quick reply
I think the behaviour of adding + 1 to the function in Athena could raise some warnings even though it matches the R output it is quite strange to debug locally
Thanks for the amazing pkg