DyfanJones / noctua

Connect R to Athena using paws SDK (DBI Interface)
https://dyfanjones.github.io/noctua/
Other
45 stars 5 forks source link

Can I set various parameters in `.aws/config` file and have `DBI::dbConnect()` read those directly from that file? #193

Open emmansh opened 1 year ago

emmansh commented 1 year ago

In this issue I want to understand whether I'm overlooking some functionality of noctua.

Can noctua read profile parameters from the .aws/config file? Specifically, I'd like to store in the config file the parameters:

In such case, my config file would look like that:

# .aws/config
[profile my-profile-foo]
region = us-east-1
output = json
s3_staging_dir = s3://foo/bar/baz
schema_name = blah
duration_seconds = 3600
role_arn = arn:aws:iam::1234567890:role/Name-Of-Role
work_group = my-foo-bar-workgroup

Which will allow me to call DBI::dbConnect() with just:

con <- DBI::dbConnect(noctua::athena(), 
                      profile_name = "my-profile-foo"
                      )

This will be very useful because I can use version control safely when storing my code on remote (cloud) repositories, knowing that I don't expose any sensitive information about my connection.

However, currently I haven't found a way to make this work with noctua. Editing the config file as shown above results with

Error: AccessDeniedException (HTTP 400)

And the only way to establish a connection successfully requires to enter the parameters within the call to DBI::dbConnect().

emmansh commented 1 year ago

OK I've made some progress! This comment is a great reference. But still unclear why the s3_staging_dir can't be read from the config file.

# .aws/config

[profile my-profile-master]
region = us-east-1
output = json

[profile my-profile-foo]
source_profile = my-profile-master
include_profile = my-profile-master
s3_staging_dir = s3://foo/bar/baz
schema_name = blah
duration_seconds = 3600
role_arn = arn:aws:iam::1234567890:role/Name-Of-Role
work_group = my-foo-bar-workgroup
# ❌ doesn't work
con <- DBI::dbConnect(noctua::athena(), 
                      region = "us-east-1",
                      profile_name = "my-profile-foo"
                      )

Error: Please set s3_staging_dir either in parameter s3_staging_dir, environmental varaible AWS_ATHENA_S3_STAGING_DIRor when work_group is defined in create_work_group()

However, this works:

# ✅ works
con <- DBI::dbConnect(noctua::athena(), 
                      region = "us-east-1",
                      profile_name = "my-profile-foo",
                      s3_staging_dir = "s3://foo/bar/baz"
                      )

Is there a way to make DBI::dbConnect() read the s3_staging_dir value from the config file?

DyfanJones commented 1 year ago

Hi @emmansh,

Currently notua only gets AWS values from .aws/config and .aws/credentials (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). This is done through paws sdk. However noctua does allow environment variables to be set as well. So for example all aws config setting can be store in .aws/config or .aws/credentials and noctua setting can be stored as environmental variables.

To do this you can use the R package usethis. This can set environment variables in the .Renviron file.

usethis::edit_r_environ()
AWS_ATHENA_S3_STAGING_DIR=s3://my-athena-bucket/query/location/

After reseting your session, AWS_ATHENA_S3_STAGING_DIR will be available. You can then set up a connection:

con <- dbConnect(noctua::athena(), profile = "my-profile-foo")

I hope this helps :)

Note: if you are using an Athena work group. noctua will attempt to retrieve the S3 staging dir from it. So you won't need to hard code it :)

DyfanJones commented 1 year ago

Here are a list of supported environmental variables that aren't in .aws/config or .aws/credentials.

DyfanJones commented 1 year ago

I am happy for any PRs if you would like to add this feature to noctua :)

emmansh commented 1 year ago

Sounds good, thank you!