DyfanJones / noctua

Connect R to Athena using paws SDK (DBI Interface)
https://dyfanjones.github.io/noctua/
Other
45 stars 5 forks source link

InvalidRequestException with dbGetQuery #211

Closed tamuanand closed 8 months ago

tamuanand commented 9 months ago

Hi @DyfanJones

Thanks a lot for this package.

I had a question about the correct way to use this to query Athena and use with dbplyr

This below works

library(DBI)
library(noctua)
library(dplyr)
con = dbConnect(noctua::athena(), 
                s3_staging_dir = "s3://<path_to>/staging_output/"
                )

res <- dbExecute(con, 
                 "SELECT * FROM \"db_name\".\"table_name\" LIMIT 100")
dbFetch(res)

However when I try either of this, I get errors

dbGetQuery(con, sql("SELECT *  \"db_name\".\"table_name\" LIMIT 100"))

tbl(con, sql("SELECT *  \"db_name\".\"table_name\" LIMIT 100"))  

Error was like this

Error: InvalidRequestException (HTTP 400). line 1:10: mismatched input '"table_name"'.
Expecting: ',', 'EXCEPT', 'FETCH', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LIMIT', 'OFFSET', 'ORDER', 'UNION', 'WHERE', 'WINDOW', 

Am I doing something wrong in how I am executing these?

Thanks in advance.

fabhans commented 9 months ago

Hi @tamuanand, your SQL syntax is wrong. You're missing the FROM clause. These should work:

dbGetQuery(con, sql("SELECT * FROM \"db_name\".\"table_name\" LIMIT 100"))

tbl(con, sql("SELECT * FROM  \"db_name\".\"table_name\" LIMIT 100")) 
tamuanand commented 9 months ago

Hi @fabhans

Thanks. Yes, I was missing that FROM clause. Sorry about the same.

I have a follow-up question/issue

> dbExistsTable(con, "iris.iris")
[1] TRUE
> tbl(con, 
+     sql("SELECT * FROM  \"iris\".\"iris\" LIMIT 2")) 
INFO: (Data scanned: 0 Bytes)
INFO: (Data scanned: 7.69 KB)
# Source:   SQL [2 x 5]
# Database: Athena 0.2.0 [default@<region>/default]
  sepal_length sepal_width petal_length petal_width species
         <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa

However, this snippet below fails. I have tried

> athena_iris <- tbl(con, "\"iris\".\"iris\"")
> athena_iris
Error in `collect()`:
! Failed to collect lazy table.
Caused by error:
! TABLE_NOT_FOUND: line 2:6: Table 'datacatalog.default."iris"."iris"' does not exist
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `collect()`:
! Failed to collect lazy table.
Caused by error:
! TABLE_NOT_FOUND: line 2:6: Table 'datacatalog.default."iris"."iris"' does not exist
---
Backtrace:
     ▆
  1. ├─base (local) `<fn>`(x)
  2. └─dbplyr:::print.tbl_sql(x)
  3.   ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
  4.   │ ├─base::cat(paste0(..., "\n"), sep = "")
  5.   │ └─base::paste0(..., "\n")
  6.   ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
  7.   └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
  8.     └─pillar:::format_tbl(...)
  9.       └─pillar::tbl_format_setup(...)
 10.         ├─pillar:::tbl_format_setup_dispatch(...)
 11.         └─pillar:::tbl_format_setup.tbl(...)
 12.           └─pillar:::df_head(x, n + 1)
 13.             ├─base::as.data.frame(head(x, n))
 14.             └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
 15.               ├─base::as.data.frame(collect(x, n = n))
 16.               ├─dplyr::collect(x, n = n)
 17.               └─dbplyr:::collect.tbl_sql(x, n = n)

This is my sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] styler_1.10.1 dplyr_1.1.2   noctua_2.6.1  DBI_1.1.3    

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.5       compiler_4.3.0       crayon_1.5.2         tidyselect_1.2.0    
 [5] Rcpp_1.0.10          uuid_1.1-0           mime_0.12            readr_2.1.4         
 [9] R6_2.5.1             paws.analytics_0.2.0 generics_0.1.3       curl_5.0.1          
[13] paws.storage_0.2.0   tibble_3.2.1         R.cache_0.16.0       paws.common_0.5.6   
[17] pillar_1.9.0         tzdb_0.4.0           R.utils_2.12.2       paws_0.2.0          
[21] rlang_1.1.1          utf8_1.2.3           cli_3.6.1            magrittr_2.0.3      
[25] digest_0.6.31        rstudioapi_0.14      dbplyr_2.3.2         hms_1.1.3           
[29] lifecycle_1.0.3      R.methodsS3_1.8.2    R.oo_1.25.0          vctrs_0.6.3         
[33] glue_1.6.2           data.table_1.14.8    fansi_1.0.4          purrr_1.0.1         
[37] httr_1.4.6           tools_4.3.0          pkgconfig_2.0.3

Is my syntax incorrect?

Thanks in advance.

DyfanJones commented 8 months ago

Sorry about the delay in my reply:

By default noctua will use the schema and catalog from your connection. That is why it is returning this error:

TABLE_NOT_FOUND: line 2:6: Table 'datacatalog.default."iris"."iris"' does not exist

It is looking for table name "iris"."iris" in catalog: datacatalog and schema: default.

library(DBI)
library(dplyr)

con <- dbConnect(noctua::athena())

# will look for iris in "default" schema
athena_iris <- tbl(con, "iris")

You can override this by using the dbplyr helper function dbplyr::in_schema

library(DBI)
library(dplyr)

con <- dbConnect(noctua::athena())

athena_iris <- tbl(con, dbplyr::in_schema("iris", "iris"))

Alternatively you could change the schema you connect to.

library(DBI)
library(dplyr)

con <- dbConnect(noctua::athena(), schema_name = "iris")

athena_iris <- tbl(con, "iris")
tamuanand commented 8 months ago

Thanks @DyfanJones