Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
91 stars 19 forks source link

Setting multiple retention results in weird result #547

Closed bingkunyangvungle closed 4 months ago

bingkunyangvungle commented 4 months ago

What happened?

When I have set both "retention.ms" = "86400000" and "local.retention.bytes" = 500000000 in the configuration for the topic, the cluster does delete the local segments and keep the local storage the size as expected. However, it didn't copy the file to the remote storage. (no metric and no data) .

Here's the complete configs:

      partitions                = 3
      replication_factor        = 2
      config                    = {
        "retention.ms"          = "86400000"
        "local.retention.bytes" = 500000000 # 500MB and 3GB in total
        "segment.bytes"         = 100000000 # 100 MB per segment
        "remote.storage.enable" = true
      }

What did you expect to happen?

There are data copy to remote storage and the metric kafka_server_brokertopicmetrics_remotecopybytes_total will show up some values. (currently it is 0)

What else do we need to know?

Here are some interesting logs from the brokers:

[2024-05-16 09:59:34,624] INFO [UnifiedLog partition=<topic_name>, dir=/data/kafka] Deleting segment LogSegment(baseOffset=41135079, size=773355898, lastModifiedTime=1715853574616, largestRecordTimestamp=Some(1715767067396)) due to log retention time 86400000ms breach based on the largest record timestamp in the segment (kafka.log.UnifiedLog)
[2024-05-16 09:59:34,629] INFO [UnifiedLog partition=<topic_name>, dir=/data/kafka] Incremented log start offset to 45678675 due to segment deletion (kafka.log.UnifiedLog)
[2024-05-16 09:59:34,644] INFO [UnifiedLog partition=<topic_name>, dir=/data/kafka] Deleting segment LogSegment(baseOffset=38543520, size=1073741803, lastModifiedTime=1715853521300, largestRecordTimestamp=Some(1715767012938)) due to log retention time 86400000ms breach based on the largest record timestamp in the segment (kafka.log.UnifiedLog)
[2024-05-16 09:59:34,647] INFO [UnifiedLog partition=<topic_name>, dir=/data/kafka] Incremented log start offset to 44844234 due to segment deletion (kafka.log.UnifiedLog)
[2024-05-16 10:00:34,587] INFO [LocalLog partition=<topic_name>, dir=/data/kafka] Deleting segment files LogSegment(baseOffset=43214128, size=594474247, lastModifiedTime=1715853574576, largestRecordTimestamp=Some(1715767063989)) (kafka.log.LocalLog$)

I don't think this is a bug, but I still couldn't explain what exactly happened here. Please help to take a look.

bingkunyangvungle commented 4 months ago

The above scenario happens only when I use the MirrorMaker2 to sync the data from source cluster to the target cluster. I also ran some other tests to store the data to the target cluster and the plugin successfully persisted the data into S3. So I think the issue might be about the MirrorMaker2, not the cluster itself.