airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.88k stars 4.07k forks source link

Destination S3: Credentials not being accepted, but are valid and working on CLI #18277

Closed johnsmclay closed 1 month ago

johnsmclay commented 1 year ago

Environment

Current Behavior

When doing a new destination on Databricks, it requires S3 as a staging area. I created a bucket, an account, IAM policies for that user, and API creds. When I try to save the destination it runs it's checks and says:

Could not connect to the staging persistence with the provided configuration. Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: S8MN1DBZWEJJ0RNK; S3 Extended Request ID: wfk1HANlX43oDk187wR0yj8Xh7mH4AQ1TGefQBK/jaMGAhPFqm0yEypu9RD/u5NgdunAV+1uQx8=; Proxy: null)

So, S3 access denied. Not sure what it's trying to do, so I check out where it's happening: io.airbyte.integrations.destination.s3.S3StorageOperations.createBucketObjectIfNotExists(S3StorageOperations.java:103) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.39.41-alpha.jar:?] That line is where it tries to use the AWS S3 java library to put the file in question (I guess a test file?) into the location.

Using those credentials, on AWS CLI I can push a file to that location successfully: aws s3 cp ~/Downloads/Result_20.csv s3://my-cool-bucket-name/non-managed-tables/airbyte/ --profile test

I also successfully set up plain plain S3 sources and destinations with those same credentials and the same location. The S3 source finds no files to import, the S3 destination gives the exact same error about S3 saying 403 AccessDenied.

so I know:

  1. the credentials work for writing a file to that location (on the AWS CLI)
  2. the credentials work for reading a file from that location (on the AWS CLI)
  3. it's not an issue w/ the databricks integration specifically

My best guess is that Airbyte is either:

  1. Trying to do something not covered by my IAM policy for some reason (it doesn't print what command it is trying in the log, only that it was denied)
  2. It's not actually using my supplied credentials.

The IAM policy:

{
    "Id": "my-cool-bucket-name-TST",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:GetBucketLocation",
                "s3:ListAllMyBuckets"
            ],
            "Resource": "arn:aws:s3:::*"
        }, # ^ added this statement to see if maybe it was trying to list all buckest first, that didn't help
        {
            "Action": [
                "s3:*Object", <-- added this just in case I was missing something.  Even tried "s3:*" just for giggles
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::my-cool-bucket-name/non-managed-tables/*"
            ]
        },
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::my-cool-bucket-name"
        }
    ],
    "Version": "2012-10-17"
}

Info about the bucket - Versioning is off, SSE is enabled by default, but it isn't required, so you won't get rejected if you aren't doing SSE. Public acccess is blocked, object owner is the writer, the bucket policy is the automatic one where root can do things, anyone else needs IAM authorization. No lifecycle, replication, or access points.

Expected Behavior

It should complete the check and continue on. Or at least tell me what rights it needs that it doesn't have. Even the documentation on the right in the databricks connector just talks about generic IAM stuff and says you should be using roles NOT an access key and ID (lol). On the S3 one, it points to this doc which is where I got the idea to add "s3:*Object", but that didn't help.

Logs

Docker Logs:

airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.c.i.LineGobbler(voidCall):114 - ----- START CHECK -----
airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.c.i.LineGobbler(voidCall):114 -
airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.c.i.LineGobbler(voidCall):114 - Checking if airbyte/destination-databricks:0.2.6 exists...
airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.c.i.LineGobbler(voidCall):114 - airbyte/destination-databricks:0.2.6 was found locally.
airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.w.p.DockerProcessFactory(create):119 - Creating docker container = destination-databricks-check-97945a1c-dab1-4b7c-b353-cf1e8ee07726-0-tswjd with resources io.airbyte.config.ResourceRequirements@1d54efa4[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=]
airbyte-worker      | 2022-10-20 21:45:13 INFO i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/97945a1c-dab1-4b7c-b353-cf1e8ee07726/0 --log-driver none --name destination-databricks-check-97945a1c-dab1-4b7c-b353-cf1e8ee07726-0-tswjd --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-databricks:0.2.6 -e AIRBYTE_VERSION=0.40.15 -e WORKER_JOB_ID=97945a1c-dab1-4b7c-b353-cf1e8ee07726 airbyte/destination-databricks:0.2.6 check --config source_config.json
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 INFO i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 INFO i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.destination.databricks.DatabricksDestination
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 INFO i.a.i.b.IntegrationRunner(runInternal):105 - Command: CHECK
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 INFO i.a.i.b.IntegrationRunner(runInternal):106 - Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
airbyte-worker      | 2022-10-20 21:45:15 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:15 INFO i.a.i.d.s.S3DestinationConfig(createS3Client):190 - Creating S3 client...
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:17 INFO i.a.i.d.s.S3StorageOperations(createBucketObjectIfNotExists):102 - Storage Object mshp-uw2-dbrix-production-root/mshp-uw2-dbrix-production-root does not exist in bucket; creating...
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:17 ERROR i.a.i.d.j.c.CopyDestination(check):55 - Exception attempting to access the staging persistence:
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: S8MN1DBZWEJJ0RNK; S3 Extended Request ID: wfk1HANlX43oDk187wR0yj8Xh7mH4AQ1TGefQBK/jaMGAhPFqm0yEypu9RD/u5NgdunAV+1uQx8=; Proxy: null)
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[aws-java-sdk-core-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:421) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6508) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1856) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1816) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:3982) ~[aws-java-sdk-s3-1.12.6.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.s3.S3StorageOperations.createBucketObjectIfNotExists(S3StorageOperations.java:103) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.s3.S3Destination.attemptWriteAndDeleteS3Object(S3Destination.java:148) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.s3.S3Destination.attemptS3WriteAndDelete(S3Destination.java:139) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.s3.S3Destination.attemptS3WriteAndDelete(S3Destination.java:129) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.databricks.DatabricksDestination.checkPersistence(DatabricksDestination.java:58) ~[io.airbyte.airbyte-integrations.connectors-destination-databricks-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.jdbc.copy.CopyDestination.check(CopyDestination.java:53) [io.airbyte.airbyte-integrations.connectors-destination-jdbc-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:121) [io.airbyte.airbyte-integrations.bases-base-java-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:97) [io.airbyte.airbyte-integrations.bases-base-java-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 -  at io.airbyte.integrations.destination.databricks.DatabricksDestination.main(DatabricksDestination.java:33) [io.airbyte.airbyte-integrations.connectors-destination-databricks-0.39.41-alpha.jar:?]
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.i.DefaultAirbyteStreamFactory(parseJson):68 - 2022-10-20 21:45:17 INFO i.a.i.b.IntegrationRunner(runInternal):152 - Completed integration: io.airbyte.integrations.destination.databricks.DatabricksDestination
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.c.i.LineGobbler(voidCall):114 -
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.c.i.LineGobbler(voidCall):114 - ----- END CHECK -----
airbyte-worker      | 2022-10-20 21:45:17 INFO i.a.c.i.LineGobbler(voidCall):114 -

Results from "Download server Logs" (doesn't seem too useful, but who knows:

    ___    _      __          __
   /   |  (_)____/ /_  __  __/ /____
  / /| | / / ___/ __ \/ / / / __/ _ \
 / ___ |/ / /  / /_/ / /_/ / /_/  __/
/_/  |_/_/_/  /_.___/\__, /\__/\___/
                    /____/
--------------------------------------
 Now ready at http://localhost:8000/
--------------------------------------
Version: 0.40.15

2022-10-20 18:49:43 INFO i.a.c.EnvConfigs(getEnvOrDefault):1091 - Using default value for environment variable GITHUB_STORE_BRANCH: 'master'
2022-10-20 18:49:43 INFO i.a.c.EnvConfigs(getEnvOrDefault):1091 - Using default value for environment variable GITHUB_STORE_BRANCH: 'master'
2022-10-20 18:49:44 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/workspaces/list
2022-10-20 18:49:44 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/workspaces/get - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 18:50:01 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/workspaces/update - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7","initialSetupComplete":true,"displaySetupWizard":true,"email":"clay@mothership.com","anonymousDataCollection":true,"news":false,"securityUpdates":false}
2022-10-20 18:50:01 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/web_backend/workspace/state - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 18:50:02 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword existingJavaType - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-10-20 18:50:03 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list_latest
2022-10-20 18:50:03 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list
2022-10-20 18:50:04 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list_latest
2022-10-20 18:50:04 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list
2022-10-20 18:50:06 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destinations/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 18:50:06 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list_latest
2022-10-20 18:50:07 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list
2022-10-20 21:48:37 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list
2022-10-20 21:48:37 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list_latest
2022-10-20 21:48:39 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/sources/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:48:40 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/web_backend/connections/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:48:40 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list_latest
2022-10-20 21:48:41 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list
2022-10-20 21:48:44 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destinations/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:48:44 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list_latest
2022-10-20 21:48:45 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list
2022-10-20 21:48:50 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definition_specifications/get - {"sourceDefinitionId":"69589781-7828-43c5-9f63-8925b1c1ccc2","workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:51:45 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/scheduler/sources/check_connection - {"connectionConfiguration":"REDACTED","workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7","sourceDefinitionId":"69589781-7828-43c5-9f63-8925b1c1ccc2"}
2022-10-20 21:51:46 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/sources/create - {"name":"S3 test","sourceDefinitionId":"69589781-7828-43c5-9f63-8925b1c1ccc2","workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7","connectionConfiguration":"REDACTED"}
2022-10-20 21:51:48 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/sources/get - {"sourceId":"73b6723b-4d6f-4e61-b3b7-a26d6c27ba73"}
2022-10-20 21:51:48 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/get - {"sourceDefinitionId":"69589781-7828-43c5-9f63-8925b1c1ccc2"}
2022-10-20 21:51:49 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/web_backend/connections/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:51:49 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destinations/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:51:49 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list_latest
2022-10-20 21:51:49 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list
2022-10-20 21:55:08 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/web_backend/workspace/state - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:55:08 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destinations/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:55:08 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list_latest
2022-10-20 21:55:08 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list
2022-10-20 21:55:13 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/workspaces/update - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7","initialSetupComplete":true,"anonymousDataCollection":true,"news":false,"securityUpdates":false,"displaySetupWizard":false}
2022-10-20 21:55:13 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/web_backend/connections/list - {"workspaceId":"24d7c146-b688-400a-a7d3-819491f591d7"}
2022-10-20 21:55:19 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list_latest
2022-10-20 21:55:19 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/destination_definitions/list
2022-10-20 21:55:19 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list_latest
2022-10-20 21:55:20 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/source_definitions/list
2022-10-20 21:55:38 WARN o.h.v.i.p.j.JavaBeanExecutable(getParameters):216 - HV000254: Missing parameter metadata for LogType(String, int, String), which declares implicit or synthetic parameters. Automatic resolution of generic type information for method parameters may yield incorrect results if multiple parameters have the same erasure. To solve this, compile your code with the '-parameters' flag.
2022-10-20 21:55:38 INFO i.a.s.RequestLogger(filter):112 - REQ 172.20.0.5 POST 200 /api/v1/logs/get - {"logType":"server"}

printenv on the server:

JOB_ERROR_REPORTING_SENTRY_DSN=
HOSTNAME=3ddcb1ed1e19
WORKER_ENVIRONMENT=
CONFIG_ROOT=/data
WORKSPACE_ROOT=/tmp/workspace
TERM=xterm
NEW_SCHEDULER=
TRACKING_STRATEGY=segment
JOB_MAIN_CONTAINER_CPU_REQUEST=
DATABASE_USER=docker
GITHUB_STORE_BRANCH=
CONFIG_DATABASE_USER=
JOB_MAIN_CONTAINER_MEMORY_REQUEST=
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.29.15.001
JOB_MAIN_CONTAINER_CPU_LIMIT=
PWD=/app
JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto
VERSION=0.40.15
LANG=C.UTF-8
DATABASE_PASSWORD=docker
SECRET_PERSISTENCE=
SHLVL=1
HOME=/root
LOG_LEVEL=INFO
WEBAPP_URL=http://localhost:8000/
AIRBYTE_ROLE=
CONFIG_DATABASE_PASSWORD=
CONFIG_DATABASE_URL=
CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.35.15.001
TEMPORAL_HOST=airbyte-temporal:7233
JOB_MAIN_CONTAINER_MEMORY_LIMIT=
APPLICATION=airbyte-server
DATABASE_URL=jdbc:postgresql://db:5432/airbyte
AIRBYTE_VERSION=0.40.15
JOB_ERROR_REPORTING_STRATEGY=logging
_=/usr/bin/printenv

Steps to Reproduce

  1. Setup a new docker-based deployment
  2. Configure a new account on AWS with the above policy
  3. Use the credentials for that account to configure a S3 destination

Are you willing to submit a PR?

Yeah, if someone can point me to what is going wrong, I can give it a go.

natalyjazzviolin commented 1 year ago

Just double checking: this is affecting the S3 source connector?

johnsmclay commented 1 year ago

S3 destination and Databricks destination. Maybe the S3 source. My guess is it's some sort of permission issue because I set it up on a new AWS account last night and gave it the master creds and it worked. So, maybe just need to add clear guidance as to what permissions are needed? I mean, anyone who has enough data that they'd want to use this tool should be smart enough to know not to just give it free reign on their AWS account 😅

On Fri, Oct 21, 2022, 10:00 AM Nataly Merezhuk @.***> wrote:

Just double checking: this is affecting the S3 source connector?

— Reply to this email directly, view it on GitHub https://github.com/airbytehq/airbyte/issues/18277#issuecomment-1287085664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADDHUEAZ6WB7VVSPT3I6Y3WEKV2FANCNFSM6AAAAAARKUCCLU . You are receiving this because you authored the thread.Message ID: @.***>

ext-gvillafane commented 1 year ago

I'm seeing the same issue on Airbyte v0.40.10 on Kubernetes. I can see that the checker pod does not have my service account, nor I see that its passing the AWS credentials (or reference to the secret with the credentials) in the env vars.

ext-gvillafane commented 1 year ago

Also if I leave the key id and access key empty it throws an error like they are mandatory although they are Optional (The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential.).

I would expect it to not send an AKID if its empty so it just attempts to log in through the configured credentials in the env or attached role.

image image

natalyjazzviolin commented 1 year ago

@johnsmclay @ext-gvillafane this has been brought to the attention of the engineering team!

luancaarvalho commented 1 year ago

Hello @natalyjazzviolin ! Any news of engineering team about it ?

natalyjazzviolin commented 1 year ago

@luancaarvalho Not yet! The on call engineer is aware of the issue though!

luancaarvalho commented 1 year ago

Thank you

grishick commented 1 year ago

@ext-gvillafane Airbyte connectors currently do not support reading IAM credentials from environment

azeezat commented 1 year ago

Any update on this issue?

evantahler commented 1 month ago

Closing - this destination has been updated significantly since this issue was posted. We rely on Unity Catalog now