getindata / kafka-connect-iceberg-sink

Apache License 2.0
77 stars 28 forks source link

Connector ignores fs.s3a.impl #22

Closed majidazimi closed 1 year ago

majidazimi commented 1 year ago

What I'm trying to pass through s3a requests to Alluxio [the connector pod already has Alluxio jar files], which then writes to S3 asynchronously. According to documentation,

table.write-format: "parquet"
iceberg.catalog-impl: "org.apache.iceberg.aws.glue.GlueCatalog"
iceberg.warehouse: "s3a://my_bucket/ufs"
iceberg.fs.defaultFS: "s3a://my_bucket/ufs"
iceberg.fs.s3a.aws.credentials.provider: com.amazonaws.auth.DefaultAWSCredentialsProviderChain
iceberg.fs.s3a.path.style.access: true
iceberg.fs.s3a.impl: alluxio.hadoop.ShimFileSystem
iceberg.fs.AbstractFileSystem.s3a.impl: alluxio.hadoop.AlluxioShimFileSystem

I can still see that s3 is accessed directly. Switching iceberg.fs.s3a.impl to any random string (not even a legitimate class file) seems to be ignored by connector or underlying iceberg library [because it successfully writes to S3 without crashing].

According to docs, iceberg.* should be forwarded to catalog implementation.