Closed leifulstrup closed 4 years ago
@leifulstrup are you able to provide more details about the S3 bucket you are trying to access?
@necaris yes but I don't want to paste details here. I set each file for public access. I am able to read each file individually and access via pandas and dask but then when I introduce the * in place of the char that changes between files it gives me that error. I suspect that it is "operator error" by me and a setting. Does the S3 bucket with the files need to have a special type of permission so that the list of the directory items that matches can be queried? It may be a security precaution by AWS to make it harder to query an S3 directory. Is there a bucket-level setting that I need to change?
My guess is that your S3 bucket doesn't give access to list objects, hence the ListObjectsV2 error. Making the bucket listable should resolve this issue. I encourage operating on a per-bucket level rather than a per-object/per-file level.
If you are concerned about making things too public, please note that Coiled will use your local credentials to generate a temporary security token and pass that token to your Dask workers. You should be able to make things readable and listable by a small set of people (just you if you want) and still process your data.
On Tue, Sep 29, 2020 at 2:19 PM Leif Ulstrup notifications@github.com wrote:
@necaris https://github.com/necaris yes but I don't want to paste details here. I set each file for public access. I am able to read each file individually and access via pandas and dask but then when I introduce the * in place of the char that changes between files it gives me that error. I suspect that it is "operator error" by me and a setting. Does the S3 bucket with the files need to have a special type of permission so that the list of the directory items that matches can be queried? It may be a security precaution by AWS to make it harder to query an S3 directory. Is there a bucket-level setting that I need to change?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/coiled/coiled-issues/issues/79#issuecomment-700996765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCXIY6HBHSEODNE2H3SIJFPBANCNFSM4R6JHBZA .
@mrocklin thank you. I will try that.
@mrocklin solved. I needed to explicitly add List access in S3. Thanks.
Thanks for following up @leifulstrup! I'll close this issue for now as it seems to be resolved. Feel free to re-open if needed
When trying to access an s3 resource using a wildcard using:
s3_location_wildcard = 's3://bucket-name/myfilename*.csv'
df_spending = dd.read_csv(s3_location_wildcard, dtype = dtype, storage_options={"anon": True}, blocksize="16 MiB").persist()
I get this error inside s3fs/core.py:
ClientError Traceback (most recent call last) ~/opt/anaconda3/envs/coiled_env/lib/python3.8/site-packages/s3fs/core.py in _lsdir(self, path, refresh, max_items, delimiter) 420 dircache = [] --> 421 async for i in it: 422 dircache.extend(i.get('CommonPrefixes', []))
~/opt/anaconda3/envs/coiled_env/lib/python3.8/site-packages/aiobotocore/paginate.py in anext(self) 30 while True: ---> 31 response = await self._make_request(current_kwargs) 32 parsed = self._extract_parsed_response(response)
~/opt/anaconda3/envs/coiled_env/lib/python3.8/site-packages/aiobotocore/client.py in _make_api_call(self, operation_name, api_params) 150 error_class = self.exceptions.from_code(error_code) --> 151 raise error_class(parsed_response, operation_name) 152 else:
ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
The above exception was the direct cause of the following exception:
PermissionError Traceback (most recent call last)