airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Destination GCS: Support buckets using customer-managed encryption key #18195

Open edgao opened 1 year ago

edgao commented 1 year ago

As seen in https://github.com/airbytehq/oncall/issues/790#issuecomment-1284600852

GCS buckets are all encrypted at rest. If users set this to use a customer-managed key (rather than the default Google-managed key) then this causes errors (Could not connect to the Gcs bucket with the provided configuration. Invalid base 16 character: 'J', where the J changes randomly every time you run check). Switching the bucket back to a Google-managed key allows the connector check operation to succeed.

Affected destinations:

destination-snowflake with GCS staging is somehow not affected, which is really interesting. What's it doing differently?

Example bucket - https://console.cloud.google.com/storage/browser/airbyte-edgao-test;tab=configuration?project=airbyte-edgao-test-proj&prefix=&forceOnObjectsSortingFiltering=false image

The error specifically is that the ETag metadata on the upload part is formatted differently for customer-managed encryption keys. AmazonS3Client then errors out when trying to parse it. We might want to just disable the hash check entirely (https://github.com/aws/aws-sdk-java/blob/5177b5ad08a1ba1d4c71dde1f61a3b29d57295c9/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L32-L42).

edgao commented 1 year ago

for now, docs update in https://github.com/airbytehq/airbyte/pull/18315

armsepehr commented 1 year ago

@edgao

I tried to fix the bug only with setting system variable as you mentioned in your description; however, it gives me another similar error in the write command. Then, I change the S3Config for the GCS class and it works.

Note that I have included one required field in the sample secret file since the compilation error mislead me for several hour to find-out the problem. This change can happen for other similar destination connector as well.