Aiven-Open / gcs-connector-for-apache-kafka

Aiven's GCS Sink Connector for Apache Kafka®
Apache License 2.0
70 stars 38 forks source link

feat: Add support for setting object metadata `Content-Encoding` #359

Closed jclarysse closed 3 months ago

jclarysse commented 5 months ago

Users willing to leverage GCS capability to decompress gzip objects on server-side when accessing them through the Storage API requested the fixed-metadata Content-Encoding (default: null) to become configurable so that its value can be set (ie. to gzip) when the connector uploads a new file to the bucket. https://cloud.google.com/storage/docs/metadata#content-encoding

jclarysse commented 5 months ago

Wasn't able to run my integration test on my local and here it failed. I'll now go back to my local to hopefully fix it.

jclarysse commented 3 months ago

@jjaakola-aiven shared that the integration test passed on his local.

jclarysse commented 3 months ago

The expected behaviour is that for compressed blobs with metadata Content-Encoding=gzip, the result of object download should be uncompressed. This can be easily verified using GCP sample code downloadFile.js.

Since GCS connector previously only had tests based on object read, I had to add some boilerplate-code to make reading from download possible.

The new test contentEncodingAwareDownload() passes when using parameters compression=none and content-encoding=none. Unfortunately, it fails to decode required fields when using parameterscompression=gzip and content-encoding=gzip as the bytes do not seem to be uncompressed.

java.lang.IllegalArgumentException: Illegal base64 character 1f

I wonder if this is a limitation of Testcontainer's DatastoreEmulator.

jclarysse commented 3 months ago

@jjaakola-aiven Thanks for your help with fixing the test so that both compression and encoding work as expected. I pushed again using your patch. Please review.