aiven / kafka

Mirror of Apache Kafka
Apache License 2.0
2 stars 1 forks source link

Repeated Exceptions when topic is being deleted #35

Closed HenryCaiHaiying closed 1 year ago

HenryCaiHaiying commented 1 year ago

Another repeated exception seen when a topic (tier1) is deleted.

The file no longer existed since the topic is being deleted. It's a race condition between two deletion cleanup activities. This is not a fatal exception but the ERROR and the stack trace repeated hundreds of times in the log. The problem eventually resolved when it realized the topic is removed.

[2023-07-19 22:11:10,921] ERROR Error encountered while writing committed offsets to a local file (org.apache.kafka.server.log.remote.metadata.storage.PrimaryConsumerTask) java.nio.file.NoSuchFileException: /mnt/kafka/partitions/tier1-0/remote_log_snapshot.tmp

HenryCaiHaiying commented 1 year ago

Stack Trace

[2023-07-19 22:11:10,921] ERROR Error encountered while writing committed offsets to a local file (org.apache.kafka.server.log.remote.metadata.storage.PrimaryConsumerTask)
java.nio.file.NoSuchFileException: /mnt/kafka/partitions/tier1-0/remote_log_snapshot.tmp
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
        at org.apache.kafka.server.log.remote.metadata.storage.RemoteLogMetadataSnapshotFile.write(RemoteLogMetadataSnapshotFile.java:92)
        at org.apache.kafka.server.log.remote.metadata.storage.FileBasedRemoteLogMetadataCache.flushToFile(FileBasedRemoteLogMetadataCache.java:107)
        at org.apache.kafka.server.log.remote.metadata.storage.RemotePartitionMetadataStore.syncLogMetadataSnapshot(RemotePartitionMetadataStore.java:124)
        at org.apache.kafka.server.log.remote.metadata.storage.PrimaryConsumerTask.syncCommittedDataAndOffsets(PrimaryConsumerTask.java:215)
        at org.apache.kafka.server.log.remote.metadata.storage.PrimaryConsumerTask.consumeFromPrimaryConsumer(PrimaryConsumerTask.java:193)
        at org.apache.kafka.server.log.remote.metadata.storage.PrimaryConsumerTask.run(PrimaryConsumerTask.java:172)
HenryCaiHaiying commented 1 year ago

Needs to add some sleep/wait after the exception is thrown: https://github.com/aiven/kafka/blob/3.3-2022-10-06-tiered-storage/storage/src/main[…]fka/server/log/remote/metadata/storage/PrimaryConsumerTask.java

mdedetrich commented 1 year ago

Resolved upstream at https://github.com/apache/kafka/pull/13947