DyfanJones / RAthena

Connect R to Athena using Boto3 SDK (DBI Interface)
https://dyfanjones.github.io/RAthena/
Other
35 stars 6 forks source link

Issue with install_boto() #85

Closed VolodymyrClarify closed 4 years ago

VolodymyrClarify commented 4 years ago

Hello!

So I accidentally typed (n), while running RAthena::install_boto(), e.g.:

RAthena::install_boto() No non-system installation of Python could be found. Would you like to download and install Miniconda? Miniconda is an open source environment management system for Python. See https://docs.conda.io/en/latest/miniconda.html for more details. Would you like to install Miniconda? [Y/n]: n

Now, I don't see that prompt and I cannot install boto:

RAthena::install_boto() Using virtual environment 'RAthena' ... /home/volodymyr/.virtualenvs/RAthena/bin/python: No module named pip Error in strsplit(output, "\s+")[[1]] : subscript out of bounds In addition: Warning message: In system2(python, c("-m", "pip", "--version"), stdout = TRUE) : running command ''/home/volodymyr/.virtualenvs/RAthena/bin/python' -m pip --version' had status 1

What would be my next steps in order to install boto and use RAthena?

DyfanJones commented 4 years ago

@VolodymyrClarify, sorry for my late response. From reading the error messages it looks like you don't have python on your machine:

No non-system installation of Python could be found.

And as a result you don't have pip:

No module named pip

To resolve this I recommend you installing python3 from the anaconda distribution (https://www.anaconda.com/distribution/). This means you will have pip and conda on your system.

When you have installed python on your machine you can either re-run install_boto or you can run pip install boto3 from you terminal.

Let me know how you get on.

VolodymyrClarify commented 4 years ago

Hi @DyfanJones

Sorry I was traveling so couldn't test it right away.

Looks like Python is installed: $ python3 --version

Python 3.6.9

And I also installed boto3: $ pip show boto3

Name: boto3 Version: 1.12.14 Summary: The AWS SDK for Python Home-page: https://github.com/boto/boto3 Author: Amazon Web Services Author-email: UNKNOWN License: Apache License 2.0 Location: /home/volodymyr/.local/lib/python2.7/site-packages Requires: jmespath, botocore, s3transfer

But when I try to connect to Athena I'm getting the following error:

library(DBI)

con <- dbConnect(RAthena::athena(),
                 aws_access_key_id='xxxxxxx',
                 aws_secret_access_key='xxxxxxx',
                 s3_staging_dir='s3://some/location/',
                 region_name='us-xxxx-x')

Error: Boto3 is not detected please install boto3 using either: `pip install boto3` in terminal or `install_boto()`. Alternatively `reticulate::use_python` or `reticulate::use_condaenv` will have to be used if boto3 is in another environment.

DyfanJones commented 4 years ago

It looks like your boto3 is installed on your python 2.7:

pip show boto3

# Location: /home/volodymyr/.local/lib/python2.7/site-packages

I believe this is because pip is linked to your python 2.7

Can you check if boto3 is installed for your python3+:

pip3 show boto3

Note: pip3 usually refers to python3 +

Please let me know what you get.

VolodymyrClarify commented 4 years ago

That is true! pip3 show boto3 doesn't return anything.

Do you have any tips how to re-install boto3 or how to link it to python3+?

UPDATE: running $ pip3 install boto3 under my user helped!

Now I see:

pip3 show boto3

Name: boto3
Version: 1.12.16
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: UNKNOWN
License: Apache License 2.0
Location: /home/volodymyr/.local/lib/python3.6/site-packages
Requires: jmespath, s3transfer, botocore

And consequently I got a connection to Athena


con <- dbConnect(RAthena::athena(),
                 aws_access_key_id='xxxxxxx',
                 aws_secret_access_key='xxxxxxx',
                 s3_staging_dir='s3://some/location/',
                 region_name='us-xxxx-x') 

Thank you @DyfanJones for helping me troubleshot a problem!

DyfanJones commented 4 years ago

@VolodymyrClarify brilliant :) I will close this ticket. If you have any further issues / feature requests please raise another ticket :)

DyfanJones commented 4 years ago

Side note: you can utilise environment variables to mask your credentials or you can utilise the aws cli to set them up in the .aws directory.

Please check out: dbConnect for list of supported environmental variables

VolodymyrClarify commented 4 years ago

Thanks!

library(DBI)

con = dbConnect( RAthena::athena(),
           aws_access_key_id = Sys.getenv( "AWS_ACCESS_KEY_ID" ),
           aws_secret_access_key = Sys.getenv( "AWS_SECRET_ACCESS_KEY" ),
           s3_staging_dir = 's3://some/location/',
           region_name = Sys.getenv( "AWS_DEFAULT_REGION" ) )
DyfanJones commented 4 years ago

@VolodymyrClarify this can be simiplfied to:

library(DBI)

con = dbConnect( RAthena::athena(),
           s3_staging_dir = 's3://some/location/',
           region_name = Sys.getenv( "AWS_DEFAULT_REGION" ) )

If you use AWS_REGION instead of AWS_DEFAULT_REGION then you can simply it even further EDIT: (can't remember if this environmental variable is due to the latest dev or on cran version)

library(DBI)

con = dbConnect( RAthena::athena(),
           s3_staging_dir = 's3://some/location/')

Finally if you set the environmental AWS_ATHENA_S3_STAGING_DIR for your s3 staging dir. you can simplify your connection to:

library(DBI)

con = dbConnect( RAthena::athena())

You can then double check by getting the connection information:

dbGetInfo(con)
VolodymyrClarify commented 4 years ago

I can confirm that simple version worked, e.g.

library(DBI)

con = dbConnect( RAthena::athena() )

Note: (for others) don't forget to restart R in order for changes in .Renviron to take place.

DyfanJones commented 4 years ago

I will add your note into the documentation.