aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.93k stars 701 forks source link

`NoCredentialsError: Unable to locate credentials` with `s3.describe_objects` and a valid `boto3_session` argument #1491

Closed ClementSicard closed 2 years ago

ClementSicard commented 2 years ago

Describe the bug

When passed a valid boto3.Session, s3.describe_objects is able to describe one object but not a list of objects, whereas it is supposed to be supported by the library (here)

How to Reproduce

In all cases, a valid session is provided to the function

>>> wr.s3.list_objects(path="s3://clement-test-1", boto3_session=session)

['s3://clement-test-1/folder2/10.pdf',
 's3://clement-test-1/folder2/11.pdf',
 's3://clement-test-1/folder2/12.pdf',
 's3://clement-test-1/folder2/13.pdf',
 's3://clement-test-1/folder2/subfolder1/10.pdf',
 's3://clement-test-1/folder2/subfolder1/11.pdf',
 's3://clement-test-1/folder2/subfolder1/12.pdf',
 's3://clement-test-1/folder2/subfolder1/13.pdf']

This query works, the session is well defined (the bucket is private).

When I try to describe one of these objects:

>>> wr.s3.describe_objects(path='s3://clement-test-1/folder2/subfolder1/10.pdf', boto3_session=session)

{'s3://clement-test-1/folder2/subfolder1/10.pdf': {'ResponseMetadata': {'RequestId': 'xxxxxxx',
   'HostId': 'xxxxxxx',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'xxxxxxx',
    'x-amz-request-id': 'xxxxxxxxxxxxxx',
    'date': 'Tue, 02 Aug 2022 10:56:33 GMT',
    'last-modified': 'Tue, 21 Jun 2022 12:03:31 GMT',
    'etag': '"xxxxxxxxxxxxxx"',
    'accept-ranges': 'bytes',
    'content-type': 'application/pdf',
    'server': 'AmazonS3',
    'content-length': '14749033'},
   'RetryAttempts': 0},
  'AcceptRanges': 'bytes',
  'LastModified': datetime.datetime(2022, 6, 21, 12, 3, 31, tzinfo=tzutc()),
  'ContentLength': 14749033,
  'ETag': '"xxxxxxxxxxxxxx"',
  'ContentType': 'application/pdf',
  'Metadata': {}}}

But when I try to use a list of arguments for path (it is supported according to the documentation ), a NoCredentialError is raised, whereas the session is valid (as it worked for the above calls), and the files exist on the bucket

>>> wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)

---------------------------------------------------------------------------
NoCredentialsError                        Traceback (most recent call last)
/var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in <cell line: 1>()
----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session)

with this stack trace:

Stack trace ``` --------------------------------------------------------------------------- NoCredentialsError Traceback (most recent call last) /var/folders/y8/fqhzmbr93t1g76sjf_vschr80000gn/T/ipykernel_5709/2496337718.py in () ----> 1 wr.s3.describe_objects(path=['s3://clement-test-1/folder2/subfolder1/10.pdf', 's3://clement-test-1/folder2/subfolder1/11.pdf'], boto3_session=session) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in describe_objects(path, version_id, use_threads, last_modified_begin, last_modified_end, s3_additional_kwargs, boto3_session) 154 versions = [version_id.get(p) if isinstance(version_id, dict) else version_id for p in paths] 155 with concurrent.futures.ThreadPoolExecutor(max_workers=cpus) as executor: --> 156 resp_list = list( 157 executor.map( 158 _describe_object_concurrent, /opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result_iterator() 607 # Careful not to keep a reference to the popped future 608 if timeout is None: --> 609 yield fs.pop().result() 610 else: 611 yield fs.pop().result(end_time - time.monotonic()) /opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in result(self, timeout) 444 raise CancelledError() 445 elif self._state == FINISHED: --> 446 return self.__get_result() 447 else: 448 raise TimeoutError() /opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py in __get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception /opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py in run(self) 56 57 try: ---> 58 result = self.fn(*self.args, **self.kwargs) 59 except BaseException as exc: 60 self.future.set_exception(exc) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object_concurrent(path, boto3_primitives, s3_additional_kwargs, version_id) 48 ) -> Tuple[str, Dict[str, Any]]: 49 boto3_session = _utils.boto3_from_primitives(primitives=boto3_primitives) ---> 50 return _describe_object( 51 path=path, boto3_session=boto3_session, s3_additional_kwargs=s3_additional_kwargs, version_id=version_id 52 ) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/s3/_describe.py in _describe_object(path, boto3_session, s3_additional_kwargs, version_id) 35 if version_id: 36 extra_kwargs["VersionId"] = version_id ---> 37 desc = _utils.try_it( 38 f=client_s3.head_object, ex=client_s3.exceptions.NoSuchKey, Bucket=bucket, Key=key, **extra_kwargs 39 ) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/awswrangler/_utils.py in try_it(f, ex, ex_code, base, max_num_tries, **kwargs) 341 for i in range(max_num_tries): 342 try: --> 343 return f(**kwargs) 344 except ex as exception: 345 if ex_code is not None and hasattr(exception, "response"): ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 506 ) 507 # The "self" in this scope is referring to the BaseClient. --> 508 return self._make_api_call(operation_name, kwargs) 509 510 _api_call.__name__ = str(py_operation_name) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 896 else: 897 apply_request_checksum(request_dict) --> 898 http, parsed_response = self._make_request( 899 operation_model, request_dict, request_context 900 ) ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/client.py in _make_request(self, operation_model, request_dict, request_context) 919 def _make_request(self, operation_model, request_dict, request_context): 920 try: --> 921 return self._endpoint.make_request(operation_model, request_dict) 922 except Exception as e: 923 self.meta.events.emit( ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in make_request(self, operation_model, request_dict) 117 request_dict, 118 ) --> 119 return self._send_request(request_dict, operation_model) 120 121 def create_request(self, params, operation_model=None): ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in _send_request(self, request_dict, operation_model) 196 context = request_dict['context'] 197 self._update_retries_context(context, attempts) --> 198 request = self.create_request(request_dict, operation_model) 199 success_response, exception = self._get_response( 200 request, operation_model, context ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/endpoint.py in create_request(self, params, operation_model) 132 service_id=service_id, op_name=operation_model.name 133 ) --> 134 self._event_emitter.emit( 135 event_name, 136 request=request, ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs) 410 def emit(self, event_name, **kwargs): 411 aliased_event_name = self._alias_event_name(event_name) --> 412 return self._emitter.emit(aliased_event_name, **kwargs) 413 414 def emit_until_response(self, event_name, **kwargs): ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs) 254 handlers. 255 """ --> 256 return self._emit(event_name, kwargs) 257 258 def emit_until_response(self, event_name, **kwargs): ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/hooks.py in _emit(self, event_name, kwargs, stop_on_response) 237 for handler in handlers_to_call: 238 logger.debug('Event %s: calling handler %s', event_name, handler) --> 239 response = handler(**kwargs) 240 responses.append((handler, response)) 241 if stop_on_response and response is not None: ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in handler(self, operation_name, request, **kwargs) 101 # this method is invoked to sign the request. 102 # Don't call this method directly. --> 103 return self.sign(operation_name, request) 104 105 def sign( ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/signers.py in sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name) 185 raise e 186 --> 187 auth.add_auth(request) 188 189 def _choose_signer(self, operation_name, signing_type, context): ~/Library/Caches/pypoetry/virtualenvs/wizard-qTP-fgZ2-py3.9/lib/python3.9/site-packages/botocore/auth.py in add_auth(self, request) 405 def add_auth(self, request): 406 if self.credentials is None: --> 407 raise NoCredentialsError() 408 datetime_now = datetime.datetime.utcnow() 409 request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP) NoCredentialsError: Unable to locate credentials ```

Expected behavior

I would expect a list of metadata JSON to be returned by the function (and most importantly the credentials in the boto3.Session to be correctly located, as in the single-file case)

Your project

No response

Screenshots

No response

OS

macOS

Python version

3.9.13

AWS DataWrangler version

2.16.1

Additional context

No response

malachi-constant commented 2 years ago

Thanks for opening @ClementSicard , I will attempt to replicate and get back to you soon.

malachi-constant commented 2 years ago

Hmm I am unable to replicate @ClementSicard

>>> import boto3
>>> import awswrangler as wr
>>> wr.__version__
'2.16.1'
>>> my_session = boto3.session.Session()
>>> result = wr.s3.list_objects(path, boto3_session=my_session)
>>> wr.s3.describe_objects(path=result[1:3], boto3_session=my_session)
{'s3://hansonlu-test-data-bucket/csv/file1.csv': {'ResponseMetadata': {'RequestId': 'C99Y0HBTE8VKW090', 'HostId': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'LUCEVRUCek4xLT7IXiCbOlYierDdcbQGwTBc4IlQmX+7OZuLPUPMpKrJcfJtSzELBlxMDyvqQj0=', 'x-amz-request-id': 'C99Y0HBTE8VKW090', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:46 GMT', 'etag': '"3fc4883f513a6ce7a3487e521e58de92"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '20'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 46, tzinfo=tzutc()), 'ContentLength': 20, 'ETag': '"3fc4883f513a6ce7a3487e521e58de92"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}, 's3://hansonlu-test-data-bucket/csv/file2.csv': {'ResponseMetadata': {'RequestId': 'C99MMKYCREFXS20S', 'HostId': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'zel3k5GK/lumbfwkOBj1D3JaBM5xycn66jmICeqKS3U0gurmOIjLID5C6wbuXZ2lMY/MZYcp6e0=', 'x-amz-request-id': 'C99MMKYCREFXS20S', 'date': 'Mon, 08 Aug 2022 17:57:09 GMT', 'last-modified': 'Thu, 21 Apr 2022 23:07:48 GMT', 'etag': '"13e27af06c955d43b12da432b839b204"', 'x-amz-server-side-encryption': 'AES256', 'accept-ranges': 'bytes', 'content-type': 'binary/octet-stream', 'server': 'AmazonS3', 'content-length': '14'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2022, 4, 21, 23, 7, 48, tzinfo=tzutc()), 'ContentLength': 14, 'ETag': '"13e27af06c955d43b12da432b839b204"', 'ContentType': 'binary/octet-stream', 'ServerSideEncryption': 'AES256', 'Metadata': {}}}

Is there any specific configuration in your session object I can test?

malachi-constant commented 2 years ago

Closing for now as bug cannot be replicated. Please reopen if this issue is persistent and more context can be provided.

ataghavey commented 1 year ago

I am experiencing a similar issue as reported above with the awswrangler.s3.describe_objects() method and a valid boto3 session.

Method works just file when a single string with path to a single s3 object is passed in, however, when a path that's upstream to multiple s3 objects, or a list of paths is passed in for the path arg, this error is retrieved:

NoCredentialsError: Unable to locate credentials