Open joekhoobyar opened 3 years ago
@joekhoobyar is this for a particular connector? all of them?
@sherifnada - it should be the for the whole system, IMO. connectors, as well as the base images airbyte/*
.
FYI - Currently, this is blocking our deployment of AirByte for one of our customers, due to their security posture.
@joekhoobyar I think I understand the ask in the case of connectors. Can you help me understand it in the "whole system" case -- are you saying when hitting Airbyte via the API, you don't want to use any Airbyte-generated API keys but rather rely on IAM permissions to control authn/authz within Airbyte?
No, @sherifnada actually, I think something has gotten lost in the translation here.
What I'm asking for is much simpler than that.
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCES_KEY
environment variables in order for the airbyte containers to run.Using the AWS SDK, this is quite simple. Simply do not set those environment variables - and AWS will take care of getting the API keys for you, from the instance profile. This is much more secure - since there are no longer any keys to be rotated, leaked, etc.
For example, here is the documentation for the Java SDK. The other SDKs handle it the same way:
@sherifnada trying to get are heads around the logging when deployed to AWS.
In .env file
# Cloud log backups. Don't use this unless you know what you're doing. Mainly for Airbyte devs.
# If you just want to capture Docker logs, you probably want to use something like this instead:
# https://docs.docker.com/config/containers/logging/configure/
S3_LOG_BUCKET=
S3_LOG_BUCKET_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
S3_MINIO_ENDPOINT=
S3_PATH_STYLE_ACCESS=
Which sounds like the logging should be handled through docker.
In the docker-compose.yaml file, we see these environment variables for the server and the scheduler
- S3_LOG_BUCKET=${S3_LOG_BUCKET}
- S3_LOG_BUCKET_REGION=${S3_LOG_BUCKET_REGION}
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
These are used in airbyte-config/.../S3Logs.java to create the client to write logs to an S3 bucket.
I think there are two questions.
All logging errors go away, if I add the following blank environment variables:
S3_LOG_BUCKET = ""
S3_LOG_BUCKET_REGION = ""
S3_MINIO_ENDPOINT = ""
S3_PATH_STYLE_ACCESS = ""
GCP_STORAGE_BUCKET = ""
Can someone confirm if S3 logging works from an EC2 instance using an assuming IAM instance profile?
Can we avoid having to set AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
?
@davinchia for awareness
I am wondering if this can be enhanced to use the DefaultCredentialsProvider. This will allow the logging to be more generic and work everywhere irrespective of how the AWS credentials are configured. Q - can I submit changes as a PR to this core functionality or does this need to be handled by the Airbyte Team ?
We definitely welcome contributions! This is handled by another open source project: https://github.com/bluedenim/log4j-s3-search. If you contribute to that project, I'm happy to pull in the latest version!
Tell us about the problem you're trying to solve
We are trying to deploy airbyte, in AWS, without provisioning any additional AWS API keys.
Describe the solution you’d like
I would like Airbyte to support using IAM instance profile, like most software.
The AWS SDK already knows how to automatically use the IAM instance profile - you only have to not pass any credentials to it - and it will search for credentials in multiple ways.
Describe the alternative you’ve considered or used
We don't have an alternative, as provisioning API keys for service accounts is against our existing security posture.
Additional context
Add any other context or screenshots about the feature request here.
Are you willing to submit a PR?
I am willing to submit a PR, if somebody can point me to the relevant places in the code that create AWS API clients.