airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.31k stars 3.95k forks source link

Source file: AWS Assumed role credentials doesn't work with source-file connector #8227

Open JD-V opened 2 years ago

JD-V commented 2 years ago
## Enviroment - **Airbyte version**: v0.32.25-alpha - **OS Version / Instance**: macOS - **Deployment**: Kubernetes - **Source Connector and version**: source-file:0.2.7-beta - **Severity**: Medium - **Step where error happened**: Setup new connection ## Current Behavior When I try to use standalone image of source-file to create source connection, it accepts access key and secret key of a user to and creates AWS session. If I have a IAM user in aws this flow works fine but If I have generated access/secret by Assume role service with [STS](https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html) I will have to pass `aws_session_token` along with `aws_access_key_id` and `aws_secret_access_key` while creating [boto3 sesion](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html). But since [_setup_boto_session( )](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-s3/source_s3/s3file.py#L35) does not add `aws_session_token`, at session creation, it fails. ## Expected Behavior [*_setup_boto_session()](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-s3/source_s3/s3file.py#L35) should add aws_session_token optionally while creating session in order for assumed role credentials to work * ## Steps to Reproduce 1. run source-file image standalone with assumed role credentials ## Are you willing to submit a PR?

Yes

alafanechere commented 2 years ago

Hi @JD-V our source file connector nor S3 connector supports STS sessions. I'm not sure offering the user to fill a aws_session_token field would be the right approach as STS session are temporary and it will lead to a broken source if the user does not update the token manually. I'd suggest to ask for an enhancement on the S3 connector to make it support assume role mechanism as it is made in our source_amazon_seller_partner connector.

alafanechere commented 2 years ago

This is closely related to the following issue: https://github.com/airbytehq/airbyte/issues/5942

JD-V commented 2 years ago

While I agree with your point @alafanechere, there are users like me who just wants to use airbyte for one time replication. My organisation refrains me from using IAM credentials, hence the only option left to me is to assume a role and use that creds to run Airbyte.

If you are worried about temporary sessions breaking the source, how about just adding this feature in backend/image so that atleast apis can support it?

alafanechere commented 2 years ago

@JD-V do you feel comfortable enough contributing to this feature? I'm pinging @sherifnada and @misteryeo to give you definitive feedback on this feature addition, but whatever their answer you'll still be able to use your contribution as a custom connector.

JD-V commented 2 years ago

Yes I’d like to contribute to this. Let me know how your conversation goes. Meanwhile I’ll start looking at the code base.

On Thu, 30 Dec 2021 at 11:43 PM, Augustin @.***> wrote:

@JD-V https://github.com/JD-V do you feel comfortable enough contributing to this feature? I'm pinging @sherifnada https://github.com/sherifnada and @misteryeo https://github.com/misteryeo to give you definitive feedback on this feature addition, but whatever their answer you'll still be able to use your contribution as a custom connector.

— Reply to this email directly, view it on GitHub https://github.com/airbytehq/airbyte/issues/8227#issuecomment-1003133086, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRIOQSIGLRCS3OSN5ZYKX3UTSOLVANCNFSM5IV5N45Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

misteryeo commented 2 years ago

@JD-V Sorry for the delay here. As yet, we haven't heard from others who have the same use case as you for a one time replication. This is just a thought experiment but I'm curious if you'd be okay with making this change just for your version of the connector and not being merged which the official connector at this time until we see more requests for this particular use case come up. Would that be feasible? We'll keep this issue open to collect feedback from other users in the meantime.

gtrak commented 2 years ago

I'm trying to test this locally on docker and the existing AWS setup requires the STS option.

It is OK for the credentials to be temporary when evaluating airbyte, and the boto default fallback using standard AWS env-vars and instance-profile would also be fine and is standard across industry tools.

Is there a method to pass credentials down through envvars when running through docker? I tried adding them to the docker-compose.yaml file in the worker section, and it wasn't enough.