brokamp-group / s3

More efficient downloading and usage of files hosted on AWS S3
http://brokamp-group.github.io/s3/
Other
3 stars 2 forks source link

s3_get has to have credentials #2

Closed cole-brokamp closed 3 years ago

cole-brokamp commented 3 years ago

s3_get, which uses boto3, cannot download public files without AWS credentials...

Is this a dealbreaker if we don't want to rely on users being able to set their credentials? Or will we have to use credentials no matter what?

cole-brokamp commented 3 years ago

could we provide a backup method for those without python/boto3 or without aws credentials?

erikarasnick commented 3 years ago

I installed latest version of s3 and removed arguments for aws credentials from schwartz container and rebuilt. Testing the container failed with

Error in py_call_impl(callable, dots$args, dots$keywords) :
  NoCredentialsError: Unable to locate credentials

Detailed traceback:
  File "/root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/botocore/client.py", line 663, in _make_api_call
    operation_model, request_dict, request_context)
  File "/root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/botocore/client.py", line 682, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = se
Calls: <Anonymous> ... Reduce -> lapply -> FUN -> <Anonymous> -> py_call_impl
Execution halted

I shelled into the container and tried readRDS(s3_get("s3://geomarker/testing_downloads/mtcars.rds")) and got the same error.

cole-brokamp commented 3 years ago

It looks like the container is snapshotting the right version of this package which has the functionality enabled. Can you test s3_get just in this package to download a public file when your aws credentials are unset?

I can't reproduce the failure myself and it works in the tests on gh actions too.

Try this:

withr::with_envvar(new = c(
    "AWS_ACCESS_KEY_ID" = NA,
    "AWS_SECRET_ACCESS_KEY" = NA
  ), {s3::s3_get("s3://geomarker/testing_downloads/mtcars.rds")})
cole-brokamp commented 3 years ago

I can also run the example in the README from https://github.com/degauss-org/schwartzGeohashPM without any AWS credentials. If the only failure is in the container, then maybe its something in there?

erikarasnick commented 3 years ago

That works, so it must have to do with the container. I tried rebuilding, which didn't help. I will close the issue here since it's not a problem with the s3 package and dig deeper into the container. thanks!

cole-brokamp commented 3 years ago

I just cloned the schwartz repo, built the container and got the same error that you did with make test.... strange.

cole-brokamp commented 3 years ago

I dug further and it might be that something is happening with Linux. The GH actions only tests on macOS. I just updated the checks and we'll see if it fails on linux

cole-brokamp commented 3 years ago

it passed on linux and windows ¯_(ツ)_/¯

cole-brokamp commented 3 years ago

Some of the tests were being erroneously skipped, which explains the passing test. I also think that the boto3 module was finding AWS credentials outside of system variables in R somehow, which explains how the public downloads work.

In general, there is no way to allow public downloads through boto3 without AWS credentials. These credentials don't necessarily have to have access to the files if they are public, but they do need to be valid credentials.

I've fixed this in 42d330b049f4269c54f5c5e83246188bd5f1a197 by changing the backend for public files to not use boto, but rather rely on the httr R package using more standard GET requests.

I've also set it up so that python/boto3 are only necessary if you want to download private files. Things like the schwartz container that rely on public files shouldn't need to have this functionality, so we can omit the steps to install miniconda and boto3 in that container.

cole-brokamp commented 3 years ago

I've also made this release "0.2", so you can update that dependency in addschwartzGeohashPM and degauss/schwartz