Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
91 stars 19 forks source link

Disk space not released #513

Closed funky-eyes closed 6 months ago

funky-eyes commented 6 months ago

What can we help you with?

I found that after running for a while, the disk occupancy is not actually released, I am using raft cluster mode and I don't know how to troubleshoot it image Why is this happening? I found that after I restarted the node again, the disk space was freed up

Where would you expect to find this information?

funky-eyes commented 6 months ago

image I found a very large number of files that were deleted but still referenced without freeing up space

funky-eyes commented 6 months ago

I've clearly seen this file in s3, but locally he's still not cleaned up!

ivanyu commented 6 months ago

Hi @funky-eyes By "not cleaned up" you mean they exist as ".deleted"?

funky-eyes commented 6 months ago

Hi @funky-eyes By "not cleaned up" you mean they exist as ".deleted"?

They no longer exist in the catalogue. Can you see the picture I sent? The deleted file is still being referenced, resulting in the disk space not released.

funky-eyes commented 6 months ago

Hi @funky-eyes By "not cleaned up" you mean they exist as ".deleted"?

Could this be due to the operating system? I've noticed that after a while, the disk is actually freed, but it's minutes or even tens of minutes before it's freed!

funky-eyes commented 6 months ago

I set a topic again, remote.storage.enable=false, retention.ms=180000, and when it is cleaned up on disk, the disk space is freed up almost in real-time

funky-eyes commented 6 months ago

image I waited for more than ten hours and the space still hasn't been freed, which is significantly different from the performance of a topic without tiered storage, where some files that no longer exist on disk are still being referenced by kafka

funky-eyes commented 6 months ago

I also gave feedback on this issue in the kafka community:https://[issues.apache.org/jira/browse/KAFKA-16378](https://issues.apache.org/jira/browse/KAFKA-16378)

ivanyu commented 6 months ago

We're looking into this, trying to first understand if it's the plugin's or broker's problem.

funky-eyes commented 6 months ago

We're looking into this, trying to first understand if it's the plugin's or broker's problem.

Thank you very much for your intervention. The cluster I deployed is in kraft mode, and the code used is the latest main branch packaged and deployed, and I found that as long as I usejcmd pid GC.run, the disk space occupation will be released immediately, but when there is no gc, some files are not released, and there is no error-level log output in the log.

funky-eyes commented 6 months ago

And I'm using s3's tiered storage implementation

ivanyu commented 6 months ago

Seems to be really an issue in the plugin. Will be fixed in https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/pull/516

funky-eyes commented 6 months ago

Seems to be really an issue in the plugin. Will be fixed in #516

Thanks, I'll pull it up later and recompile it locally for testing.

funky-eyes commented 6 months ago

Seems to be really an issue in the plugin. Will be fixed in #516

I understand that the purpose of this PR is to introduce a ClosableInputStreamHolder, which uniformly handles the closing of all InputStreams generated during the copyLogSegmentData phase, ensuring that the streams are correctly closed. Is my understanding correct?

funky-eyes commented 6 months ago

@ivanyu I have confirmed that this issue has been fixed by #516

ivanyu commented 6 months ago

I understand that the purpose of this PR is to introduce a ClosableInputStreamHolder, which uniformly handles the closing of all InputStreams generated during the copyLogSegmentData phase, ensuring that the streams are correctly closed. Is my understanding correct?

Yeah, that's correct. We forgot to close those streams and through this the open files. They lingered open until the Java internal cleaning machinery kicks in and closes the files.

funky-eyes commented 6 months ago

I understand that the purpose of this PR is to introduce a ClosableInputStreamHolder, which uniformly handles the closing of all InputStreams generated during the copyLogSegmentData phase, ensuring that the streams are correctly closed. Is my understanding correct?

Yeah, that's correct. We forgot to close those streams and through this the open files. They lingered open until the Java internal cleaning machinery kicks in and closes the files.

Thank you for your eagerness to help.