DyfanJones / noctua

Connect R to Athena using paws SDK (DBI Interface)
https://dyfanjones.github.io/noctua/
Other
45 stars 5 forks source link

Caching issue #171

Closed DyfanJones closed 2 years ago

DyfanJones commented 2 years ago

Caching fails when statement is in SQL class type:

library(DBI)

noctua::noctua_options(cache_size = 10)

con = dbConnect(noctua::athena())

df = dbGetQuery(con, "select * from iris")
#> Info: (Data scanned: 3.63 KB)

df = dbGetQuery(con, SQL("select * from iris"))
#> Error in .set_ops_arg_check(x, y, all, .seqn = TRUE): Item 2 of x is 'SQL' but the corresponding item of y is 'character'.

Created on 2021-11-15 by the reprex package (v2.0.1)

This behave also persists when using dplyr lazy evalution method tbl.

Thanks to @ramnathv for identifying this issue in https://github.com/DyfanJones/noctua/issues/96#issuecomment-968102412 .

DyfanJones commented 2 years ago

PR #172 seems to fix issue. @ramnathv can you double check branch cache_issue fixes your initial issue.

remotes::install_github("dyfanjones/noctua", ref="cache_issue")

Working example:

library(DBI)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

noctua::noctua_options(cache_size = 10)

con = dbConnect(noctua::athena())

df = dbGetQuery(con, "select * from iris")
#> Info: (Data scanned: 3.63 KB)

df = dbGetQuery(con, SQL("select * from iris"))
#> Info: (Data scanned: 3.63 KB)

# Check cache
noctua:::athena_option_env$cache_dt
#>                                 QueryId              Query     State
#> 1: 181b8974-2288-4ea0-b7fb-d146daea61b8 select * from iris SUCCEEDED
#>    StatementType WorkGroup UnloadDir
#> 1:           DML   primary

tbl_iris = tbl(con, "iris")

# Check cache
noctua:::athena_option_env$cache_dt
#>                                 QueryId              Query     State
#> 1: 181b8974-2288-4ea0-b7fb-d146daea61b8 select * from iris SUCCEEDED
#>    StatementType WorkGroup UnloadDir
#> 1:           DML   primary

tbl_iris
#> Info: (Data scanned: 3.63 KB)
#> # Source:   table<iris> [?? x 5]
#> # Database: Athena 0.1.12 [default@eu-west-1/default]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with more rows

# Check cache
noctua:::athena_option_env$cache_dt
#>                                 QueryId                           Query
#> 1: 43970523-da56-4e59-8d7b-625e8bc7a8af SELECT *\nFROM "iris"\nLIMIT 11
#> 2: 181b8974-2288-4ea0-b7fb-d146daea61b8              select * from iris
#>        State StatementType WorkGroup UnloadDir
#> 1: SUCCEEDED           DML   primary          
#> 2: SUCCEEDED           DML   primary

Created on 2021-11-15 by the reprex package (v2.0.1)

Looks like repeat queries are being used multiple times, which is great news :D

DyfanJones commented 2 years ago

This has now been pushed to the cran in release 2.4.0