Closed nicholsn closed 1 year ago
Hi @nicholsn, interesting. Have you tried dplyr::show_query()
to find out the query it is trying to send to athena?
Hey @DyfanJones, I did try to pipe it into show_query
but the error is thrown even with just tbl(con, query)
- no collect()
needed.
If it is ok, i will have a deeper look tomorrow :) i need to see if i can replicate this issue :) i am guessing it is erroring when it is trying to get the column names for the tbl class.
OOO just a thought, I think i know what is going. For tbl
, dplyr does an initial query to what ever database it is connected to. It does this by sub-querying the table so it will look something like:
-- example of sub-query
select var1, var2, var3 from blah.table
dplyr's wrapper
select * from (
select var1, var2, var3 from blah.table
) as sub_query
where 1 = 0
However in your query you have ;
at the end of your query, so dplyr will do the following:
select * from (
select var1, var2, var3 from blah.table;
) as sub_query
where 1 = 0
As there a ;
inside sub-query it will fail. So can you try without ;
. I hope this helps to explain what is going on in the backend. Please let me know if this fixes your issue :)
I can confirm this is the case. You need to remove ;
from your sub-query. Here is an example demonstrating this.
library(DBI)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
con <- dbConnect(noctua::athena())
tbl(con, sql("select * from sampledb.elb_logs;"))
#> Error: InvalidRequestException (HTTP 400). Only one sql statement is allowed. Got: SELECT *
#> FROM (select * from sampledb.elb_logs;) "q01"
#> WHERE (0 = 1)
tbl(con, sql("select * from sampledb.elb_logs"))
#> INFO: (Data scanned: 0 Bytes)
#> INFO: (Data scanned: 2.52 MB)
#> # Source: SQL [?? x 19]
#> # Database: Athena 0.1.12 [default@eu-west-1/default]
#> request_tim…¹ elb_n…² reque…³ reque…⁴ backe…⁵ backe…⁶ reque…⁷ backe…⁸ clien…⁹
#> <chr> <chr> <chr> <int> <chr> <int> <dbl> <dbl> <dbl>
#> 1 2015-01-01T0… elb_de… 248.46… 31279 172.40… 80 3.26e-4 7.85e-4 6.91e-4
#> 2 2015-01-01T0… elb_de… 248.11… 22272 172.46… 8888 6.53e-4 9.53e-4 7.61e-4
#> 3 2015-01-01T0… elb_de… 253.19… 30529 172.30… 443 1.26e-4 8.1 e-4 3.52e-4
#> 4 2015-01-01T0… elb_de… 247.24… 13001 172.41… 8888 1.43e-3 1.39e-3 5.56e-4
#> 5 2015-01-01T0… elb_de… 248.21… 6024 172.52… 8888 9.31e-4 2.62e-4 6.53e-4
#> 6 2015-01-01T0… elb_de… 245.20… 16173 172.38… 443 1.1 e-3 1.99e-3 1.38e-3
#> 7 2015-01-01T0… elb_de… 253.13… 29168 172.33… 80 5.37e-4 1.67e-3 6.13e-4
#> 8 2015-01-01T0… elb_de… 250.16… 26741 172.49… 8888 1.52e-3 1.08e-3 9.06e-4
#> 9 2015-01-01T0… elb_de… 251.12… 16934 172.32… 80 2.12e-4 9.59e-4 9.69e-4
#> 10 2015-01-01T0… elb_de… 241.13… 18980 172.55… 8888 8.86e-4 9.97e-4 1.32e-3
#> # … with more rows, 10 more variables: elb_response_code <chr>,
#> # backend_response_code <chr>, received_bytes <int64>, sent_bytes <int64>,
#> # request_verb <chr>, url <chr>, protocol <chr>, user_agent <chr>,
#> # ssl_cipher <chr>, ssl_protocol <chr>, and abbreviated variable names
#> # ¹request_timestamp, ²elb_name, ³request_ip, ⁴request_port, ⁵backend_ip,
#> # ⁶backend_port, ⁷request_processing_time, ⁸backend_processing_time,
#> # ⁹client_response_time
Created on 2022-12-15 with reprex v2.0.2
I hope this helps
That was it - nice catch! The error message from dplyr
is accurate if you can spot that they're wrapping your query and making it a sub-query. Thank you for your help, @DyfanJones!
Issue Description
When following the example for how to submit a query using
dplyr
I get an error indicating that "Only one sql statement is allowed".From the error statement it looks like
dplyr
ornoctua
is injecting aSELECT * FROM (...)
to wrap around my query that includes a sub-query in the FROM clause.Reproducible Example
I don't have a way of reproducing this since I think it is specific to Athena only being able to process one query at a time, but I can share the query.
The above throws the error however this works
Any thoughts on why Athena is interpreting this as two
SELECT
statements?Session Info
```r devtools::session_info() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Big Sur 11.6.1 Matrix products: default LAPACK: /Users/nnichols/micromamba/envs/mr/lib/libopenblasp-r0.3.21.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] tibble_3.1.8 tidyr_1.2.1 dplyr_1.0.10 noctua_2.6.0 [5] gwasvcf_0.1.0 VariantAnnotation_1.44.0 Rsamtools_2.14.0 Biostrings_2.66.0 [9] XVector_0.38.0 SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.0 [13] GenomeInfoDb_1.34.1 IRanges_2.32.0 S4Vectors_0.36.0 MatrixGenerics_1.10.0 [17] matrixStats_0.62.0 BiocGenerics_0.44.0 DBI_1.1.3 BiocManager_1.30.19 loaded via a namespace (and not attached): [1] bitops_1.0-7 bit64_4.0.5 filelock_1.0.2 progress_1.2.2 httr_1.4.4 [6] tools_4.2.2 utf8_1.2.2 R6_2.5.1 withr_2.5.0 tidyselect_1.2.0 [11] prettyunits_1.1.1 bit_4.0.4 curl_4.3.3 compiler_4.2.2 cli_3.4.1 [16] xml2_1.3.3 paws_0.1.12 DelayedArray_0.24.0 rtracklayer_1.58.0 readr_2.1.3 [21] paws.analytics_0.1.12 rappdirs_0.3.3 stringr_1.4.1 digest_0.6.30 pkgconfig_2.0.3 [26] dbplyr_2.2.1 fastmap_1.1.0 BSgenome_1.66.1 rlang_1.0.6 rstudioapi_0.14 [31] RSQLite_2.2.18 BiocIO_1.8.0 generics_0.1.3 jsonlite_1.8.3 BiocParallel_1.32.0 [36] RCurl_1.98-1.9 magrittr_2.0.3 GenomeInfoDbData_1.2.9 Matrix_1.5-3 Rcpp_1.0.9 [41] fansi_1.0.3 lifecycle_1.0.3 stringi_1.7.8 yaml_2.3.6 zlibbioc_1.44.0 [46] BiocFileCache_2.6.0 grid_4.2.2 blob_1.2.3 parallel_4.2.2 crayon_1.5.2 [51] lattice_0.20-45 GenomicFeatures_1.50.2 hms_1.1.2 KEGGREST_1.38.0 knitr_1.40 [56] pillar_1.8.1 uuid_1.1-0 rjson_0.2.21 codetools_0.2-18 paws.common_0.5.1 [61] biomaRt_2.54.0 XML_3.99-0.12 glue_1.6.2 data.table_1.14.4 png_0.1-7 [66] vctrs_0.5.0 tzdb_0.3.0 purrr_0.3.5 assertthat_0.2.1 cachem_1.0.6 [71] xfun_0.34 paws.storage_0.1.12 mime_0.12 restfulr_0.0.15 GenomicAlignments_1.34.0 [76] AnnotationDbi_1.60.0 memoise_2.0.1 ellipsis_0.3.2 ```