DyfanJones / noctua

Connect R to Athena using paws SDK (DBI Interface)
https://dyfanjones.github.io/noctua/
Other
45 stars 5 forks source link

Almost random: InvalidRequestException (HTTP 400). Idempotent parameters do not match / Could not find results #104

Closed magnussv closed 3 years ago

magnussv commented 3 years ago

Hello Dyfan, Lately I had problems with RAthena so I turned to noctua instead and I noticed some peculiar, almost random, issues.

Issue Description

When running the code below then I usually get Error: InvalidRequestException (HTTP 400). Idempotent parameters do not match or sometimes Error: InvalidRequestException (HTTP 400). Could not find results HOWEVER about 1 of 5 times I retrieve the desired result. It almost seems random. Issue occurs in both R Console and in Rstudio. As a side note: If I for example use library(aws.s3); bucketlist() then I always retrieves the desired result, so I know that I'm assuming the role correctly.

Reproducible Example

Example:

library(DBI)
library(noctua)

noctua::assume_role(
role_arn = "arn:aws:iam::666666666:role/ARole",
 duration_seconds = 3600L,
 region_name = "eu-west-1",
 set_env = TRUE)

con <- dbConnect(noctua::athena(), s3_staging_dir = "s3://aws-athena-query-results-eu-west-1-328396111111")

dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")

Screenshot:

Screenshot 2020-09-21 at 20 34 33

Traceback:


dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")
Error: InvalidRequestException (HTTP 400). Idempotent parameters do not match

> traceback()
8: stop(resp)
7: retry_api_call(response <- conn@ptr$Athena$start_query_execution(QueryString = statement, 
       QueryExecutionContext = list(Database = conn@info$dbms.name), 
       ResultConfiguration = ResultConfiguration(conn), WorkGroup = conn@info$work_group))
6: AthenaResult(conn = conn, statement = statement, s3_staging_dir = s3_staging_dir)
5: dbSendQuery(conn, statement = statement)
4: dbSendQuery(conn, statement = statement)
3: .local(conn, statement, ...)
2: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")
1: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")

dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")
Error: InvalidRequestException (HTTP 400). Could not find results

> traceback()
7: stop(resp)
6: retry_api_call(result_class <- res@connection@ptr$Athena$get_query_results(QueryExecutionId = res@info$QueryExecutionId, 
       MaxResults = as.integer(1))$ResultSet$ResultSetMetadata$ColumnInfo)
5: dbFetch(res = rs, n = -1, ...)
4: dbFetch(res = rs, n = -1, ...)
3: .local(conn, statement, ...)
2: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")
1: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] noctua_1.8.0 DBI_1.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2                   paws.analytics_0.1.9         digest_0.6.20                zeallot_0.1.0                crayon_1.3.4                
 [6] paws.common_0.3.3            R6_2.4.0                     jsonlite_1.6                 backports_1.1.4              httr_1.4.0                  
[11] pillar_1.4.2                 rlang_0.4.0                  curl_4.0                     data.table_1.13.0            ini_0.3.1                   
[16] paws_0.1.9                   xml2_1.2.1                   vctrs_0.2.0                  paws.security.identity_0.1.9 readr_1.3.1                 
[21] paws.storage_0.1.9           hms_0.5.0                    compiler_3.6.1               pkgconfig_2.0.2              tibble_2.1.3      

DyfanJones commented 3 years ago

Hi @magnussv, really sorry to here that you are having issues with RAthena and noctua . This is the first time i have come across this type of error. But from a quick google it looks similar to this: https://github.com/ankane/blazer/issues/234 . I will have a look around to see if there is simple solution :)

magnussv commented 3 years ago

Thanks @DyfanJones ! I found ankane/blazer#234 when I googled as well, so I tried to run many (new/different) queries but with the same type of result on all of them. I didn't fully understand that issue but the title said "does not allow you to run same query multiple times". :-)

DyfanJones commented 3 years ago

Hi @magnussv, i just want to check some details with you. Does mtcars exist in 2 schema's? For example default.mtcars and adhoc.mtcars?

DyfanJones commented 3 years ago

Plus can you let me know how you connect to athena? I am wondering if there is a problem with the defualt schema. I.e

You connect to the default, but you are querying the adhoc schema for example.

con <- dbConnect(noctua:: athena())`

dbGetQuery(con, "select * from adhoc.mtcars")
DyfanJones commented 3 years ago

This looks like a possible solution: https://paws-r.github.io/docs/athena/start_query_execution.html.

The parameter ClientRequestToken :

A unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). If another StartQueryExecution request is received, the same response is returned and another query is not created. If a parameter has changed, for example, the QueryString, an error is returned.This token is listed as not required because AWS SDKs (for example the AWS SDK for Java) auto-generate the token for users. If you are not using the AWS SDK or the AWS CLI, you must provide this token or the action will fail.

Could be the issue. I will need to read some boto3, documentation in how to handle this(any suggestions welcome).

In the meantime can you try caching: https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html and let me know if this persists.

(Sorry for the mountain of questions please take your time to answer them :) )

DyfanJones commented 3 years ago

From investigating further it looks like there are 2 possible issues going on here.

DyfanJones commented 3 years ago

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

magnussv commented 3 years ago

@DyfanJones thanks for your quick response. You find answers to your initial questions below.

Hi @magnussv, i just want to check some details with you. Does mtcars exist in 2 schema's? For example default.mtcars and adhoc.mtcars?

mtcars exist only in adhoc. The default is empty. It seems like it doesn't matter which table I'm querying, I get the same issues independent of the table (even with tables with unique names across all databases).

Plus can you let me know how you connect to athena? I am wondering if there is a problem with the defualt schema. I.e

You connect to the default, but you are querying the adhoc schema for example.

con <- dbConnect(noctua:: athena())`

dbGetQuery(con, "select * from adhoc.mtcars")

I'm using credentials from a local ~/.aws/credentials with [default] profile; I have tried with others as well. I have also tried to connect to a specific schema as well, with the same result.

In the meantime can you try caching: https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html and let me know if this persists.

I have tried with noctua_options(cache_size = 0), noctua_options(cache_size = 10), noctua_options(cache_size = 99) (note: = 100 gives an error), noctua_options() and noctua_options(clear_cache = T) but the problem persists.

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

I will try it and come back to you!

magnussv commented 3 years ago

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

@DyfanJones I can confirm that this resolved the issues! I ran around 100 queries and all returned the expected result. Great! Thanks for the instant correction. :-) 👍

DyfanJones commented 3 years ago

@magnussv oh brilliant. I didn't know what to do next if that failed 😆 I will merge the branch after I have tweaked the caching option to allow 100 distinct queries (thanks for point that out to me :D )

DyfanJones commented 3 years ago

noctua_1.8.1 has been submitted to the cran.