Closed DavidArenburg closed 4 years ago
Hi @DavidArenburg, sorry about that. I will have a look to see why it is failing.
@DavidArenburg does this not give you your credentials?
library(DBI)
library(RAthena)
con <- dbConnect(RAthena::athena(),
region_name = 'us-east-1',
profile_name = 'default',
s3_staging_dir = paste(s3_bucket, s3_output, sep = "/"))
# method to get to boto3 to check if credentials have been passed correctly
con@ptr$get_credentials()$get_frozen_credentials()
Or is it returning nothing?
It fails before on the dbConnect
with the
Error: The config profile (default) could not be found"
If I comment profile_name = 'default',
I'm getting
Error in eval(expr, envir, enclos): attempt to apply non-function Traceback:
Without even the traceback
I have the suspicion that is due the fact that cat ~/.aws/credentials
doesn't work in windows
@DavidArenburg This is a strange error as I pass the profile_name directly to boto3.Session()
. What is even more unusual is that your python example is working :S
Does the python work without profile_name being set so for example:
import boto3
session = boto3.Session()
credentials = session.get_credentials().get_frozen_credentials()
credentials
Yes
@DavidArenburg are any aws environmental variables set?
@DavidArenburg
Do you mind checking if the reticulate
can get boto3
credentials:
library(reticulate)
boto3 <- import("boto3")
session <- boto3$Session()
credentials <- session$get_credentials()$get_frozen_credentials()
credentials
No, it gives the
Error: attempt to apply non-function
I remember having similar issues with reticulate before that it couldn't find Python at all (which doesn't make any sense to me at all). I've solved it by specifying the RETICULATE_PYTHON environment variable.
I guess it's a bug related to reticulate then, rather to you. Though it's strange it's can't find the credentials although I've set this variable.
Hmm that is annoying if it is a reliculate
issue. Can you try the environmental variable method to see if it will connect to Athena for you?
The reason I wanted to use your package is because I need to QueryExecutionId
and I think it's only present in boto3. The reason for this, is when you run dbSendQuery
it creates a file in S3 and I don't know the name of that file- hence, I can't delete it afterwards. That creates a problem, because the S3 buckets gets enormous amount of data that no one needs and can't be easily removed.
I Googled this issue extensively and couldn't find anyone even addressing this issue for some reason (maybe I've sued wrong keywords).
Eventually I saw that the file name of each file is identical to the QueryExecutionId
, so you can use it in order to find and delete it from there.
If you have some easily alternative method for me, I would appreciate it.
In the meanwhile, I guess I'll need to sort out the Python issue with the reticulate package devs once and for all
Thanks
@DavidArenburg Both RAthena
and noctua
return QueryExecutionId
for example
noctua
library(DBI)
library(noctua)
con <- dbConnect(athena())
res <- dbExecute(con, "select * from iris")
res@info$QueryExecutionId
*RAthena**
library(DBI)
library(RAthena)
con <- dbConnect(athena())
res <- dbExecute(con, "select * from iris")
res@info$QueryExecutionId
The both offer a method to clear down the AWS S3 bucket for adhoc queries
con <- dbConnect(athena())
res <- dbExecute(con, "select * from iris")
dbFetch(res)
dbClearResult(res)
dbClearResult
actually clears down the AWS S3 resource by default. If you want to keep it then your can enable query caching the function *_options
(where * is the package name).
For RAthena
you can set the environmental variables instead dbConnect
Can you set them by using file.edit("~/.Renviron")
AWS_ACCESS_KEY_ID=<your key>
AWS_SECRET_ACCESS_KEY<your secert>
Make sure you reset your R kernel so R can pick them up.
Thanks, You haven't specified the method to clear down the AWS S3 bucket they offer :)
Also, there are other concerns, such as speed
dbClearResult
uses:
# remove metadata
s3$Object(s3_info$bucket, paste0(result_info$key, ".metadata"))$delete()
# remove athena output file
s3$Object(s3_info$bucket, result_info$key)$delete()
Interesting, I've used dbClearResult
previously and it had no affect on the S3 bucket
@DavidArenburg was that from RAthena or AWR.Athena?
AWR.Athena, as I wasn't able to get RAthena to work
@DavidArenburg, For speed I have moved alot of the methods from calling AWS Athena directly to using AWS Glue. For example, listing tables is returned from AWS Glue instead of a query to AWS Athena, this cuts down from 3 second Athena call to .3 second Glue call.
The packages utilise data.table
to parse data into R, alternatively you can set the backend to be vroom
which does significant speed increases
@DavidArenburg AWR.Athena uses the jdbc driver for Athena. AWS Athena alone can't delete S3 files. These packages can due to the nature of how they connect. The SDK's allow extra flexibility and give the full range of AWS Services for example S3
That's good to know. I guess I need to make your package to work then :)
@DavidArenburg another note for speed is I have recently merged the latest PR #99. This latest addition uses ALTER TABLE
when appending partitions to existing tables. Before it used MSCK REPAIR
.
As ALTER TABLE only accounts the partition you are adding it will run alot faster specially when you are adding a partition to a highly partitioned table.
@DavidArenburg try the environmental variable method highlighted above. If you come into errors, please reply to this thread and I will help you get up and running.
Plus if you notice it is running slower than the JDBC method please raise another ticket and i will see what I can do to help with performance
I've just checked the versions and they seem to match
@DavidArenburg so reticulate
is causing an issue on the same python. I might raise a ticket with reticulate to see if they know anything that could help
OK, please send me a link, I might add some info. I have a feeling that the package developers assuming that everyone installing Python using Anaconda.
@DavidArenburg raised an issue on reticulate, please fill out any more information if have missed off https://github.com/rstudio/reticulate/issues/744
In the mean time let me know how the setting of credentials in the environmental variables goes.
I was able to use noctua
successfully and it ran fast. Also, dbClearResult
actually works, so this basically solves my problem. I'm closing this issue as it is clearly not related to your package. Let's hope reticulate devs will add support for people who don't use Anaconda.
Thank you for all the help and guidance.
Hi, When running the following in Python, I get the credentials
But when trying to run the following query in R
I get
If I'm not specifying the
profile
, I'm gettingJust a note, I'm able to run this with
AWR.Athena::Athena()
successfully (it finds the credentials), e.g.returns
<JDBCResult>
Thanks
aws configure list
outputSession Info:
Session info --------------------------------------------------------------- setting value
version R version 3.5.1 (2018-07-02) os Windows 10 x64
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz Asia/Jerusalem
date 2020-03-26
Packages ------------------------------------------------------------------- ! package version date lib source
arrow 0.16.0.2 2020-02-14 [1] CRAN (R 3.5.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
AWR.Athena 2.0.7-0 2019-07-08 [1] CRAN (R 3.5.3)
aws.s3 0.3.20 2020-03-11 [1] CRAN (R 3.5.3)
aws.signature 0.5.2 2019-08-08 [1] CRAN (R 3.5.3)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.5.3)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.5.2)
bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.5.3)
bit64 0.9-7 2017-05-08 [1] CRAN (R 3.5.3)
callr 3.4.1 2020-01-24 [1] CRAN (R 3.5.3)
cli 2.0.0 2019-12-09 [1] CRAN (R 3.5.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.3)
curl 4.3 2019-12-02 [1] CRAN (R 3.5.3)
data.table 1.12.8 2019-12-09 [1] CRAN (R 3.5.3)
DBI 1.1.0.9000 2020-03-04 [1] Github (r-dbi/DBI@500edd3) desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.3)
devtools 2.2.2 2020-02-17 [1] CRAN (R 3.5.3)
digest 0.6.23 2019-11-23 [1] CRAN (R 3.5.3)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.5.3)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.3)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.3)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.5.3)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.5.3)
IRdisplay 0.7.0 2018-11-29 [1] CRAN (R 3.5.3)
IRkernel 1.1 2019-12-06 [1] CRAN (R 3.5.3)
jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.5.3)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.3)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.3)
pbdZMQ 0.3-3 2018-05-05 [1] CRAN (R 3.5.2)
pillar 1.4.3 2019-12-20 [1] CRAN (R 3.5.3)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.5.3)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.5.3)
processx 3.4.2 2020-02-09 [1] CRAN (R 3.5.3)
ps 1.3.1 2020-02-12 [1] CRAN (R 3.5.1)
purrr 0.3.3 2019-10-18 [1] CRAN (R 3.5.3)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.5.3)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.5.3)
remotes 2.1.1 2020-02-15 [1] CRAN (R 3.5.3)
repr 1.0.2 2019-12-16 [1] CRAN (R 3.5.3)
D rJava 0.9-11 2019-03-29 [1] CRAN (R 3.5.3)
RJDBC 0.2-8 2020-03-10 [1] CRAN (R 3.5.3)
rlang 0.4.5 2020-03-01 [1] CRAN (R 3.5.3)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.5.3)
testthat 2.3.1 2019-12-01 [1] CRAN (R 3.5.3)
tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.5.3)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.5.3)
uuid 0.1-2 2015-07-28 [1] CRAN (R 3.5.2)
vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.3)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.5.3)