aiven / kafka

Mirror of Apache Kafka
Apache License 2.0
2 stars 1 forks source link

Segments may be cleaned before uploading to remote tier #16

Closed ivanyu closed 1 year ago

ivanyu commented 1 year ago

Creating a topic with the config

~/kafka/kafka_2.13-3.4.0/bin/kafka-topics.sh --bootstrap-server localhost:9092 \
    --create --topic topic1 \
    --config remote.storage.enable=true \
    --config segment.bytes=512000 \
    --config local.retention.bytes=1 \
    --config retention.bytes=10000000000000

and filling it with

~/kafka/kafka_2.13-3.4.0/bin/kafka-producer-perf-test.sh \
    --topic topic1 --num-records=10000 --throughput -1 --record-size 1000 \
    --producer-props acks=1 batch.size=16384 bootstrap.servers=localhost:9092

results in local segments being deleted, but not uploaded to the remote tier. Probably, the log cleaner doesn't have a condition that prevents it from deleting a segment that is eligible for the remote tier, but hasn't been uploaded.

mdedetrich commented 1 year ago

I assume this occurs for 3.3?

ivanyu commented 1 year ago

Reproduces on 3.3-2022-10-06-tiered-storage, but works as expected on 3.0-li.

mdedetrich commented 1 year ago

Providing an update on whats going on here. The reason behind this change doesn't appear to be a regression that was introduced when cherry picking the changes from 3.0-li but rather changes that came from upstream. This can be verified by the fact that there hasn't actually been any changes to LogCleaner.scala in any of the cherry picked commits.

Rather this change https://github.com/aiven/kafka/commit/6c806430093c152b916d8e8f1fa542980c03a181 in upstream has effected how log cleanup in a pretty fundamental way and due to this I believe it is the root cause (or more accurately, additional changes are necessary in order to get this upstream change working with TS). I do have changes done locally to "fix" the problem but I am being cautious because of how large that change is.

Due to this the better course of action would be to make a change to kafka upstream and once its merged then backport it into kafka 3.3 -> 3.5.

mdedetrich commented 1 year ago

@ivanyu @jeqo I have pushed a fix at https://github.com/aiven/kafka/commit/0adaef725acde77c5f7676dd3d4fb91b66dbba0e . I have tested it within Kafka as well as using https://github.com/ivanyu/kafka-tiered-storage-demo and it appears to work (the Kafka version with the fix is cleaning up much less records than the version without the fix).

I do have a nagging feeling that this "fix" is in fact just a workaround, at least from reading the code it seems that LogCleaner needs to properly be adapted in order to handle TS rather than just forcing the old legacy behaviour which is what the fix does.

mdedetrich commented 1 year ago

@ivanyu Can you confirm that this works on your end? If so I can close this ticket and make a new one to investigate a more principled solution.

ivanyu commented 1 year ago

Tried a couple of times, worked. Considering it reproduced rather consistently, let's declare it fixed (or "fixed enough" if it's a workaround :)) @mdedetrich.