airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.31k stars 3.95k forks source link

Source S3: Unable to connect via IAM Role #34582

Open john-motif opened 7 months ago

john-motif commented 7 months ago

Connector Name

source-s3

Connector Version

4.4.0

What step the error happened?

Configuring a new connector

Relevant information

I'm attempting to connect to an S3 bucket using the new IAM Role support per these instructions. see #33944

The S3 connector gets the credentials as expected, but fails to assume the role with the following error message: botocore.exceptions.NoCredentialsError: Unable to locate credentials

I've already successfully tested/validated that the Secret Key/Access key ID approach works for this bucket and our Airbyte setup, but our customer strongly prefers the IAM role approach. I've also checked/verified that the expected value (our workspace ID) for AWS_ASSUME_ROLE_EXTERNAL_ID is being set on the environment of the container testing the S3 connection.

Our Policy is formatted as such:

{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Effect": "Allow",
        "Action": [
                "s3:GetObject",
                "s3:ListBucket"
        ],
        "Resource": [
                "arn:aws:s3:::{bucket-name}/*",
                "arn:aws:s3:::{bucket-name}"
        ]
        }
    ]
}

And our Trust Relationship is formatted like so:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${account-id}:user/${user-id}"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "${airbyte-workspace-id}"
                }
            }
        }
    ]
}

Any recommendations on how to troubleshoot or investigate further would be hugely appreciated!

Relevant log output

File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 81, in _check_list_files
 file = next(iter(stream.get_files()))
 File "/airbyte/integration_code/source_s3/v4/stream_reader.py", line 124, in get_matching_files
 s3 = self.s3_client
 File "/airbyte/integration_code/source_s3/v4/stream_reader.py", line 62, in s3_client
 self._s3_client = self._get_iam_s3_client(client_kv_args)
 File "/airbyte/integration_code/source_s3/v4/stream_reader.py", line 109, in _get_iam_s3_client
 metadata=refresh(),
 File "/airbyte/integration_code/source_s3/v4/stream_reader.py", line 89, in refresh
 role = client.assume_role(
 File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 553, in _api_call
 return self._make_api_call(operation_name, kwargs)
 File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 989, in _make_api_call
 http, parsed_response = self._make_request(
 File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 1015, in _make_request
 return self._endpoint.make_request(operation_model, request_dict)
 File "/usr/local/lib/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request
 return self._send_request(request_dict, operation_model)
 File "/usr/local/lib/python3.9/site-packages/botocore/endpoint.py", line 198, in _send_request
 request = self.create_request(request_dict, operation_model)
 File "/usr/local/lib/python3.9/site-packages/botocore/endpoint.py", line 134, in create_request
 self._event_emitter.emit(
 File "/usr/local/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
 return self._emitter.emit(aliased_event_name, **kwargs)
 File "/usr/local/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
 return self._emit(event_name, kwargs)
 File "/usr/local/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
 response = handler(**kwargs)
 File "/usr/local/lib/python3.9/site-packages/botocore/signers.py", line 105, in handler
 return self.sign(operation_name, request)
 File "/usr/local/lib/python3.9/site-packages/botocore/signers.py", line 195, in sign
 auth.add_auth(request)
 File "/usr/local/lib/python3.9/site-packages/botocore/auth.py", line 418, in add_auth
 raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 62, in check_availability_and_parsability
 file = self._check_list_files(stream)
 File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 87, in _check_list_files
 raise CheckAvailabilityError(FileBasedSourceError.ERROR_LISTING_FILES, stream=stream.name) from exc
airbyte_cdk.sources.file_based.exceptions.CheckAvailabilityError: Error listing files. Please check the credentials provided in the config and verify that they provide permission to list files. Contact Support if you need assistance.
stream=motif-test

Contribute

john-motif commented 7 months ago

@tolik0 - Not sure if you've got any suggestions for where to look here? (also, thank you for contributing this feature - our customers are super excited about it)

john-motif commented 6 months ago

Just quickly checking - any updates on this?

john-motif commented 3 months ago

Following up on this ticket one more time - I see the docs now label this feature as only available for users in a Sales Assist workflow.

Is this merely because troubleshooting the authentication setup for individual customers requires engineering knowledge (eg: is it still possible with an Open Source Airbyte deployment?), or is there actually different code running for those customers/are they on a managed instance?

gsc commented 3 months ago

I'm interested in this too. Is it possible to connect to S3 using IAM Roles in an open-source deployment? Or is this available only for customers on a managed instance? I've been trying to configure this on a local deployment, and I was running into the same issue as @john-motif.

bmulh commented 1 month ago

I am seeing this issue as well. From what I can tell AWS_ASSUME_ROLE_EXTERNAL_ID is not passed to the source-s3 container that is started. I was able to confirm that I do have the proper permissions set on the role and s3 bucket. I also confirmed that this is evaluating to None and falling into the else block.