Closed ellmanj closed 2 years ago
Hi @ellmanj thanks for highlighting this issue. I will check this out :)
So percentile_cont
is coming from dbplyr::base_win, dbplyr::base_agg and dbplyr::base_no_win. Will replace these going forward.
Branch sql_tran_median
seems to have fixed the issue:
remotes::install_github("DyfanJones/noctua", ref = "sql_tran_median")
library(DBI)
library(dplyr)
con <- dbConnect(noctua::athena())
# Create iris data set in AWS Athena
dbWriteTable(con, "iris", iris)
tbl(con, "iris") %>%
summarise(lower_percentile=quantile(petal_length,0.25),
median = median(petal_length),
upper_percentile = quantile(petal_length,0.75))
# Source: lazy query [?? x 3]
# Database: Athena 0.1.11 [default@eu-west-1/default]
lower_percentile median upper_percentile
<dbl> <dbl> <dbl>
1 1.6 4.4 5.1
tbl(con, "iris") %>%
group_by(species) %>%
summarise(lower_percentile=quantile(petal_length,0.25),
median = median(petal_length),
upper_percentile = quantile(petal_length,0.75))
# Source: lazy query [?? x 4]
# Database: Athena 0.1.11 [default@eu-west-1/default]
species lower_percentile median upper_percentile
<chr> <dbl> <dbl> <dbl>
1 virginica 5.1 5.6 5.9
2 setosa 1.4 1.5 1.6
3 versicolor 4 4.4 4.6
@ellmanj Please let me know if this fixes the issue.
thanks @DyfanJones! I tried it out. It works if I don't supply the na.rm
parameter to the median
function. But in my current project in some places we do supply that parameter.
Example:
> query <- dplyr::tbl(db_connection, "person") %>% summarise(median_value=median(age, na.rm = TRUE))
> query %>% collect()
Error in median(age, na.rm = TRUE) : unused argument (na.rm = TRUE)
Any thoughts?
@ellmanj Ah thanks, I have that behaviour missing. I will add it so that it mimics dbplyr
@ellmanj add to parameter na.rm
to median
sql_translate_env`. Please have a go.
library(DBI)
library(dplyr)
con <- dbConnect(noctua::athena())
# Create iris data set in AWS Athena
dbWriteTable(con, "iris", iris)
tbl(con, "iris") %>%
summarise(lower_percentile=quantile(petal_length,0.25),
median = median(petal_length),
upper_percentile = quantile(petal_length,0.75))
# Source: lazy query [?? x 3]
# Database: Athena 0.1.11 [default@eu-west-1/default]
lower_percentile median upper_percentile
<dbl> <dbl> <dbl>
1 1.6 4.4 5.1
# Warning message:
# Missing values are always removed in SQL.
# Use `median(x, na.rm = TRUE)` to silence this warning
# This warning is displayed only once per session.
tbl(con, "iris") %>%
group_by(species) %>%
summarise(lower_percentile=quantile(petal_length,0.25),
median = median(petal_length, na.rm = TRUE),
upper_percentile = quantile(petal_length,0.75))
# Source: lazy query [?? x 4]
# Database: Athena 0.1.11 [default@eu-west-1/default]
species lower_percentile median upper_percentile
<chr> <dbl> <dbl> <dbl>
1 virginica 5.1 5.6 5.9
2 setosa 1.4 1.5 1.6
3 versicolor 4 4.4 4.6
@DyfanJones that worked! Thanks for the quick turnaround on this.
@ellmanj PR #154 has been merged to master. Will release to the cran roughly on the 27/08 due to cran policy of monthly releases.
Will push these changes to cran tomorrow. Sorry around the delay I had a short holiday so these changes got delayed abit
No worries at all. Thank you!
On Thu, Sep 9, 2021, 9:23 AM Larefly @.***> wrote:
Will push these changes to cran tomorrow. Sorry around the delay I had a short holiday so these changes got delayed abit
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DyfanJones/noctua/issues/153#issuecomment-916145748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4FPLT4PD4KTGAGOCGXHNLUBC7MTANCNFSM5BNA7TMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Sorry the release was delayed by #156 and how to best implement it. I will update RAthena and continue with the release.
Ok thanks for the update.
noctua v-2.2.0 has now been released to the cran, thanks for your wait :D
Hello again! I've found what I think is another issue around translating to SQL.
When I try to use the
summarise
function to get the median value for a column, I get the following error.Here is the SQL being executed:
So I believe the issue is that
PERCENTILE_CONT
is not a valid Presto function. Perhaps we could use one of theapprox_percentile
functions here?