databricks / iceberg-kafka-connect

Apache License 2.0
213 stars 47 forks source link

iceberg.catalog.warehouse can override from connector config? #274

Open chethan-bs opened 3 months ago

chethan-bs commented 3 months ago

i am trying to use reset URL for ADLS and multiple storage container from connector config but i am unable to override the CATALOG_WAREHOUSE env from iceberg-rest.

"iceberg.catalog.warehouse": "abfss://storage-container-name@storageaccount.dfs.core.windows.net/warehouse", is not taking any action change from connector config and aways going with default from the rest.

could you please help on this configuration ?

"iceberg.catalog.io-impl": "org.apache.iceberg.azure.adlsv2.ADLSFileIO" works fine from connector config

kevingomez93 commented 3 weeks ago

Hello, I am experiencing a similar issue, but with the S3 implementation instead of Azure. My connector configuration includes the following:

connector.class=io.tabular.iceberg.connect.IcebergSinkConnector
transforms.DeriveEventDate.format=yyyy-MM-dd
iceberg.tables.evolve-schema-enabled=true
tasks.max=1
iceberg.tables.schema-override.event_date.type=Date
transforms=ExtractField,CopyId,DeriveEventDate
transforms.DeriveEventDate.type=org.apache.kafka.connect.transforms.TimestampConverter$Value
iceberg.catalog.client.region=us-east-1
transforms.CopyId.type=io.tabular.iceberg.connect.transforms.CopyValue
iceberg.tables.auto-create-enabled=true
iceberg.control.commit.timeout-ms=1480000
transforms.CopyId.source.field=datetime
iceberg.tables=schema.some_topic
transforms.DeriveEventDate.target.type=Date
transforms.DeriveEventDate.input.format=UNIX_MS
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
transforms.DeriveEventDate.field=event_date
iceberg.tables.default-partition-by=event_date
topics=some_topic
transforms.ExtractField.field=event
iceberg.catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
iceberg.control.commit.interval-ms=120000
iceberg.catalog.uri=thrift://some-hive-connection:9083
value.converter.schemas.enable=false
transforms.CopyId.target.field=event_date
iceberg.catalog.warehouse=s3a://some-bucket/
iceberg.catalog.type=hive
iceberg.catalog.s3.path-style-access=true
transforms.ExtractField.type=org.apache.kafka.connect.transforms.ExtractField$Value

Despite specifying iceberg.catalog.warehouse=s3a://some-bucket/, it doesn't seem to override the default S3 bucket, and the configuration is still taking the default value from the Hive service.

Has anyone found a solution to this problem, or is there something we might be missing?