Open utkanbir opened 10 months ago
Take a look at the integration tests, the setup uses MinIO, so that might help you. Also, in the #kafka-connect channel in the Iceberg Slack workspace, there was a recent thread that includes a working Docker Compose setup.
Hi, I am trying to setup kafka iceberg sink , but i am stucked after spending hours. (trying same things again and again.) Can you pls help? I have attached my docker-compose.yml file below. I put dremio, minio and confluent in same network in order to avoid network issues.
I created a source postgre jdbc connector, it works fine. Minio is up and running in 192.168.0.10:9000. In order to test it, i also created a s3 sink , i can succesfully write data to minio by using it:
This is working s3 sink config:
{ "name": "miniosink", "connector.class": "io.confluent.connect.s3.S3SinkConnector", "errors.log.enable": "true", "errors.log.include.messages": "true", "topics": [ "customer" ], "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "flush.size": "1", "s3.bucket.name": "tolga", "s3.region": "us-east-1", "aws.secret.access.key": "***", "s3.proxy.user": "", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "store.url": "http://192.168.0.12:9000" }
I have installed iceberg sink by using this folder: iceberg-kafka-connect-runtime-hive-0.6.5 I also added the aws and hadoop client libraries inside it: aws-java-sdk-core-1.12.524,aws-java-sdk-s3-1.12.524,hadoop-aws-3.3.6 etc.
This is my connector setting
{ "iceberg.catalog.s3a.endpoint": "http://192.168.0.12:9000", "iceberg.catalog.s3.endpoint": "http://192.168.0.12:9000", "iceberg.catalog.s3.secret-access-key": "8UisQraRly2Lxmykeyids.......................", "iceberg.catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO", "iceberg.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider", "iceberg.fs.defaultFS": "s3a://lakehouse", "iceberg.catalog.client.region": "us-east-1", "iceberg.catalog.uri": "http://192.168.0.12:9000", "iceberg.hadoop.fs.s3a.path.style.access": "true", "iceberg.catalog.s3a.secret-access-key": "8UisQraRly2Lxdmykeys....................", "iceberg.catalog.s3a.access-key-id": "8rmhsD4I9JCYKRMYPU4v", "iceberg.catalog.warehouse": "s3a://lakehouse", "iceberg.catalog.type": "hadoop", "iceberg.hadoop.fs.s3a.connection.ssl.enabled": "false", "iceberg.catalog.s3.access-key-id": "8rmhsD4I9JCYKRMYPU4v", "name": "icebergsink1", "connector.class": "io.tabular.iceberg.connect.IcebergSinkConnector", "errors.log.enable": "true", "errors.log.include.messages": "true", "topics": [ "customer" ], "iceberg.tables": [ "customer" ], "iceberg.tables.auto-create-enabled": "true" }
I also added aws environment variables to the containers:
I can also succesfully reach and query the iceberg tables by using spark and dremio.
But no matter what i tried in kafka connect , i am getting this error:
java.nio.file.AccessDeniedException: s3a://lakehouse/customer/metadata/version-hint.text: org.apache.hadoop.fs.s3a.auth.NoAwsCredentialsException: SimpleAWSCredentialsProvider: No AWS credentials in the Hadoop configuration
I have checked all the env variables, network ( nodes in cluster can telnet to the minio 9000 port etc) , these are ok. I think kafka connect still tries to reach global aws instead of my local minio server. How can i solve it ? Thanks tolga
docker-compose.yml
version: '2' services: zookeeper: image: confluentinc/cp-zookeeper:6.0.1 hostname: zookeeper container_name: zookeeper ports:
"2181:2181" networks: network: ipv4_address: 192.168.0.11 environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000
broker: image: confluentinc/cp-server:6.0.1 hostname: broker container_name: broker depends_on:
"9101:9101" networks: network: ipv4_address: 192.168.0.9 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092 KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1 KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_JMX_PORT: 9101 KAFKA_JMX_HOSTNAME: localhost KAFKA_CONFLUENT_SCHEMA_REGISTRY_URL: http://schema-registry:8081 CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092 CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1 CONFLUENT_METRICS_ENABLE: 'true' CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous' AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000
schema-registry: image: confluentinc/cp-schema-registry:6.0.1 hostname: schema-registry container_name: schema-registry depends_on:
"8081:8081" networks: network: ipv4_address: 192.168.0.8 environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092' SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081 AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000
connect: image: confluentinc/cp-kafka-connect-base:6.0.1 hostname: connect container_name: kafka-connect depends_on:
"8083:8083" networks: network: ipv4_address: 192.168.0.7 environment: CONNECT_BOOTSTRAP_SERVERS: 'broker:29092' CONNECT_REST_ADVERTISED_HOST_NAME: connect CONNECT_REST_PORT: 8083 CONNECT_GROUP_ID: kafka-connect CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1 CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000 CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1 CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1 CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
CLASSPATH required due to CC-2422
CLASSPATH: /usr/share/java/monitoring-interceptors/monitoring-interceptors-6.0.1.jar CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor" CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor" CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR CONNECT_PLUGIN_PATH: /usr/share/java, /usr/share/confluent-hub-components, /data/connect-jars AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000
volumes:
control-center: image: confluentinc/cp-enterprise-control-center:6.0.1 hostname: control-center container_name: control-center depends_on:
"9021:9021" networks: network: ipv4_address: 192.168.0.6 environment: CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:29092' CONTROL_CENTER_CONNECT_CLUSTER: 'connect:8083' CONTROL_CENTER_KSQL_KSQLDB1_URL: "http://ksqldb-server:8088" CONTROL_CENTER_KSQL_KSQLDB1_ADVERTISED_URL: "http://localhost:8088" CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" CONTROL_CENTER_REPLICATION_FACTOR: 1 CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1 CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1 CONFLUENT_METRICS_TOPIC_REPLICATION: 1 AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000 PORT: 9021
ksqldb-server: image: confluentinc/cp-ksqldb-server:6.0.1 hostname: ksqldb-server container_name: ksqldb-server depends_on:
"8088:8088" networks: network: ipv4_address: 192.168.0.5 environment: KSQL_CONFIG_DIR: "/etc/ksql" KSQL_BOOTSTRAP_SERVERS: "broker:29092" KSQL_HOST_NAME: ksqldb-server KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_CACHE_MAX_BYTES_BUFFERING: 0 KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor" KSQL_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor" KSQL_KSQL_CONNECT_URL: "http://connect:8083" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_REPLICATION_FACTOR: 1 KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true' KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true' AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000
ksqldb-cli: image: confluentinc/cp-ksqldb-cli:6.0.1 container_name: ksqldb-cli networks: network: ipv4_address: 192.168.0.4 depends_on:
ksqldb-server entrypoint: /bin/sh tty: true
ksql-datagen: image: confluentinc/ksqldb-examples:6.0.1 hostname: ksql-datagen container_name: ksql-datagen networks: network: ipv4_address: 192.168.0.3 depends_on:
connect command: "bash -c 'echo Waiting for Kafka to be ready... && \ cub kafka-ready -b broker:29092 1 40 && \ echo Waiting for Confluent Schema Registry to be ready... && \ cub sr-ready schema-registry 8081 40 && \ echo Waiting a few seconds for topic creation to finish... && \ sleep 11 && \ tail -f /dev/null'" environment: KSQL_CONFIG_DIR: "/etc/ksql" STREAMS_BOOTSTRAP_SERVERS: broker:29092 STREAMS_SCHEMA_REGISTRY_HOST: schema-registry STREAMS_SCHEMA_REGISTRY_PORT: 8081
rest-proxy: image: confluentinc/cp-kafka-rest:6.0.1 depends_on:
8082:8082 networks: network: ipv4_address: 192.168.0.2
hostname: rest-proxy container_name: rest-proxy environment: KAFKA_REST_HOST_NAME: rest-proxy KAFKA_REST_BOOTSTRAP_SERVERS: 'broker:29092' KAFKA_REST_LISTENERS: "http://0.0.0.0:8082" KAFKA_REST_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081' AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: 8rmhsD4I9JCYKRMYPU4v AWS_SECRET_ACCESS_KEY: 8UisQraRly2LxdHuhv22Dh35FOJ5z52iLjGnEaEe AWS_S3_ENDPOINT: http://192.168.0.12:9000
postgres: container_name: postgres_container image: postgres environment: POSTGRES_USER: ${POSTGRES_USER:-postgres} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme} PGDATA: /data/postgres volumes:
"5432:5432" networks: network: ipv4_address: 192.168.0.10
restart: unless-stopped
dremio: platform: linux/x86_64 image: dremio/dremio-oss:latest ports:
minioserver: image: minio/minio ports:
spark_notebook: image: alexmerced/spark33-notebook ports:
networks: network: driver: bridge ipam: config: