databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

Glue configurations for iceberg connector #264

Open arshahmad1 opened 5 months ago

arshahmad1 commented 5 months ago

Hi Team, I'm trying to configure this connector on confluent cloud, can someone please guide me how can I provide the configurations for connector to connect to aws glue as catalog? I already went through the Iceberg glue catalog documentation but I'm there's no related configuration there. I've a IAM role created in AWS with the required access to S3 and Glue catalog but I can't find any configuration that I can use to link the connector with aws glue. I've already tried iceberg.catalog.client.assume-role.arn and client.assume-role.arn.

Also there's no configuration to provide the glue catalog database and table names for the data to land. Can anyone please take a loot at it and help me with this 🙂
Thanks!

arshahmad1 commented 5 months ago

Here's my current configurations

{ "topics": "kafka_topic_name", "iceberg.tables": "s3_bucket_name.s3_folder_name", [# Not sure about this property]() "iceberg.catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog", "iceberg.catalog.warehouse": "s3://s3_bucket_name/s3_folder_name", "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO", "iceberg.catalog.client.assume-role.arn": "arn:aws:iam:::role/", "client.assume-role.arn": "arn:aws:iam:::role/", "value.converter.schemas.enable": "false", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.storage.StringConverter" }

arshahmad1 commented 5 months ago

hey @tabmatfournier, Sorry to ping you directly. Can you please help me here.

sharpsoul commented 5 months ago

You need to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as env variables but I know you can't do that in confluent. Look for an option to set AWS configurations in confluent.

arshahmad1 commented 5 months ago

Thanks @sharpsoul, really appreciate your help. Let me check that.

braislchao commented 3 months ago

Hey @arshahmad1 , have you found a working config? Thanks in advance.

braislchao commented 3 months ago

I'm trying with this config properties for AWS access on a custom Confluent Cloud connector:

"iceberg.catalog.s3.access-key-id": "***********",
"iceberg.catalog.s3.secret-access-key": "**********",

However im facing issues with the connector Transactional Id producer, even though I'm using an API key with full permissions over the cluster.

Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted Transactional Id authorization failed.

Have you tried those properties? Did you manage to get the connector running?

Thanks!

mkegelCognism commented 1 month ago

@braislchao I am running at the same issue as you are. Have you find any solutions?

braislchao commented 1 month ago

Hey @mkegelCognism , I deployed it on EKS using Glue Catalog:

"iceberg.catalog.io-impl":"org.apache.iceberg.aws.s3.S3FileIO",
"iceberg.catalog.catalog-impl":"org.apache.iceberg.aws.glue.GlueCatalog",

EKS exposes the ID and KEY as environment variables, and the connector reads it using the default credentials provider chain, so if you are deploying it in other place, I think you will need to configure manually in the connector machine using:

AWS_ACCESS_KEY_ID=***
AWS_SECRET_ACCESS_KEY=***
mkegelCognism commented 1 month ago

@braislchao thanks for the quick answer!

So you would say that its connected to not reading the AWS credentials correctly? Even though that the error message says that Transactional ID authorization failed?

I am locked in on Confluent cloud, and haven't found a way to define Environment variables for connectors..

braislchao commented 4 weeks ago

Hi @mkegelCognism ,

Unfortunately I wasn't able to make it work on Confluent Cloud. I guess the main problem is the AWS credentials provider. In Confluent Cloud you don't have control over the ENV variables, so you will need to use properties of the connector like aws.accessKeyId.

I have tried different properties without luck. Maybe a possibility is to use a different implementation of the provider chain to replace the default AwsCredentialsProviderChain. Let me know if you find a solution, because I shifted to EKS deployment because of this limitation.