airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.76k stars 4.04k forks source link

Change AWS lib to GCS one when using GCS connector or logs #13438

Closed marcosmarxm closed 1 month ago

marcosmarxm commented 2 years ago

Tell us about the problem you're trying to solve

Today Airbyte uses AWS compatible lib for GCS connector and logs. https://discuss.airbyte.io/t/gcs-destinaltion-connection-failed/1196/3

This can be a good practice from develoepr experinece but not for user experience. Maybe the cost to move to GCS native lib is not huge and can bring a better UX.

One example:

2022-05-24 11:44:40 INFO i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/f3b8162d-dfd1-408e-92f5-e48a70088e17/0/logs.log
2022-05-24 11:44:40 INFO i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.35.65-alpha
2022-05-24 11:44:40 INFO i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-gcs:0.2.0 exists...
2022-05-24 11:44:40 INFO i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-gcs:0.2.0 was found locally.
2022-05-24 11:44:40 INFO i.a.w.p.DockerProcessFactory(create):106 - Creating docker job ID: f3b8162d-dfd1-408e-92f5-e48a70088e17
2022-05-24 11:44:40 INFO i.a.w.p.DockerProcessFactory(create):158 - Preparing command: docker run --rm --init -i -w /data/f3b8162d-dfd1-408e-92f5-e48a70088e17/0 --log-driver none --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-gcs:0.2.0 -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.35.65-alpha -e WORKER_JOB_ID=f3b8162d-dfd1-408e-92f5-e48a70088e17 airbyte/destination-gcs:0.2.0 check --config source_config.json
2022-05-24 11:44:42 ERROR i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Class path contains multiple SLF4J bindings.
2022-05-24 11:44:42 ERROR i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Found binding in [jar:file:/airbyte/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-05-24 11:44:42 ERROR i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Found binding in [jar:file:/airbyte/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-05-24 11:44:42 ERROR i.a.c.i.LineGobbler(voidCall):82 - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2022-05-24 11:44:42 ERROR i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 INFO i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 INFO i.a.i.b.IntegrationRunner(runInternal):121 - Running integration: io.airbyte.integrations.destination.gcs.GcsDestination
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 INFO i.a.i.b.IntegrationRunner(runInternal):122 - Command: CHECK
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 INFO i.a.i.b.IntegrationRunner(runInternal):123 - Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-05-24 11:44:43 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:43 INFO i.a.i.d.s.S3FormatConfigs(getS3FormatConfig):22 - S3 format config: {"part_size_mb":5,"format_type":"CSV","flattening":"No flattening"}
2022-05-24 11:44:44 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:44 INFO i.a.i.d.s.S3Destination(testSingleUpload):81 - Started testing if all required credentials assigned to user for single file uploading
2022-05-24 11:44:46 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:46 ERROR i.a.i.d.g.GcsDestination(check):56 - Exception attempting to access the Gcs bucket: The provided security credentials are not valid. (Service: Amazon S3; Status Code: 403; Error Code: InvalidSecurity; Request ID: null; S3 Extended Request ID: null; Proxy: null)
2022-05-24 11:44:46 INFO i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-05-24 11:44:46 ERROR i.a.i.d.g.GcsDestination(check):57 - Please make sure you account has all of these roles: storage.multipartUploads.abort, storage.multipartUploads.create, storage.objects.create, storage.objects.delete, storage.objects.get, storage.objects.list
2022-05-24 11:44:46 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword existingJavaType - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-05-24 11:44:46 INFO i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling...

Describe the solution you’d like

A clear and concise description of what you want to see happen, or the change you would like to see

Describe the alternative you’ve considered or used

A clear and concise description of any alternative solutions or features you've considered or are using today.

Additional context

Add any other context or screenshots about the feature request here.

Are you willing to submit a PR?

Remove this with your answer :-)

frans-k commented 1 year ago

So does this mean that when I get:

Stack Trace: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: null; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null

The issue isn't that it's trying to call S3, but that it's using a S3-lib?