getindata / kafka-connect-iceberg-sink

Apache License 2.0
76 stars 27 forks source link

Memory Usage Issue #44

Open farbodahm opened 1 year ago

farbodahm commented 1 year ago

While conducting stress tests on the connector, I encountered an issue related to memory usage. It appears that the memory usage of the connector increases indefinitely, without any apparent limits. During the tests, I am writing 1 message per second to each of 15 different topics. Please find the attached screenshot displaying the memory usage in the ECS cluster.

Can you please investigate this issue? Thanks. Screen Shot 1402-05-30 at 17 17 10

Task config:

{
    "connector.class": "com.getindata.kafka.connect.iceberg.sink.IcebergSink",
    "table.write-format": "parquet",
    "iceberg.fs.s3a.path.style.access": "true",
    "table.namespace": "test_iceberg_database",
    "topics": "test",
    "iceberg.warehouse": "s3a://kafka-connect-landing-zone/sink",
    "iceberg.fs.defaultFS": "s3a://kafka-connect-landing-zone/sink",
    "upsert": "true",
    "iceberg.fs.s3a.aws.credentials.provider": "com.amazonaws.auth.DefaultAWSCredentialsProviderChain",
    "iceberg.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog",
    "iceberg.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
    "upsert.keep-deletes": "true",
    "iceberg.com.amazonaws.services.s3a.enableV4": "true",
    "name": "test_iceberg_database",
    "table.prefix": "kc_",
    "iceberg.com.amazonaws.services.s3.enableV4": "true",
    "table.auto-create": "true"
}

Number of tasks: 1