databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

local docker minio setting fails #176

Open utkanbir opened 10 months ago

utkanbir commented 10 months ago

Hi, I am trying to setup kafka iceberg sink , but i am stucked after spending hours. (trying same things again and again.) Can you pls help? I have attached my docker-compose.yml file below. I put dremio, minio and confluent in same network in order to avoid network issues.

I created a source postgre jdbc connector, it works fine. Minio is up and running in 192.168.0.10:9000. In order to test it, i also created a s3 sink , i can succesfully write data to minio by using it:

This is working s3 sink config:

{ "name": "miniosink", "connector.class": "io.confluent.connect.s3.S3SinkConnector", "errors.log.enable": "true", "errors.log.include.messages": "true", "topics": [ "customer" ], "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "flush.size": "1", "s3.bucket.name": "tolga", "s3.region": "us-east-1", "aws.secret.access.key": "***", "s3.proxy.user": "", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "store.url": "http://192.168.0.12:9000" }

I have installed iceberg sink by using this folder: iceberg-kafka-connect-runtime-hive-0.6.5 I also added the aws and hadoop client libraries inside it: aws-java-sdk-core-1.12.524,aws-java-sdk-s3-1.12.524,hadoop-aws-3.3.6 etc.

This is my connector setting

{ "iceberg.catalog.s3a.endpoint": "http://192.168.0.12:9000", "iceberg.catalog.s3.endpoint": "http://192.168.0.12:9000", "iceberg.catalog.s3.secret-access-key": "8UisQraRly2Lxmykeyids.......................", "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO", "iceberg.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider", "iceberg.fs.defaultFS": "s3a://lakehouse", "iceberg.catalog.client.region": "us-east-1", "iceberg.catalog.uri": "http://192.168.0.12:9000", "iceberg.hadoop.fs.s3a.path.style.access": "true", "iceberg.catalog.s3a.secret-access-key": "8UisQraRly2Lxdmykeys....................", "iceberg.catalog.s3a.access-key-id": "8rmhsD4I9JCYKRMYPU4v", "iceberg.catalog.warehouse": "s3a://lakehouse", "iceberg.catalog.type": "hadoop", "iceberg.hadoop.fs.s3a.connection.ssl.enabled": "false", "iceberg.catalog.s3.access-key-id": "8rmhsD4I9JCYKRMYPU4v", "name": "icebergsink1", "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector", "errors.log.enable": "true", "errors.log.include.messages": "true", "topics": [ "customer" ], "iceberg.tables": [ "customer" ], "iceberg.tables.auto-create-enabled": "true" }

I also added aws environment variables to the containers:

  AWS_REGION: us-east-1
  AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v
  AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe
  AWS_S3_ENDPOINT: http://192.168.0.12:9000

I can also succesfully reach and query the iceberg tables by using spark and dremio.

But no matter what i tried in kafka connect , i am getting this error:

java.nio.file.AccessDeniedException: s3a://lakehouse/customer/metadata/version-hint.text: org.apache.hadoop.fs.s3a.auth.NoAwsCredentialsException: SimpleAWSCredentialsProvider: No AWS credentials in the Hadoop configuration

I have checked all the env variables, network ( nodes in cluster can telnet to the minio 9000 port etc) , these are ok. I think kafka connect still tries to reach global aws instead of my local minio server. How can i solve it ? Thanks tolga

docker-compose.yml

version: '2' services: zookeeper: image: confluentinc/cp-zookeeper:6.0.1 hostname: zookeeper container_name: zookeeper ports:

networks: network: driver: bridge ipam: config:

bryanck commented 10 months ago

Take a look at the integration tests, the setup uses MinIO, so that might help you. Also, in the #kafka-connect channel in the Iceberg Slack workspace, there was a recent thread that includes a working Docker Compose setup.