aiven / kafka

Mirror of Apache Kafka
Apache License 2.0
2 stars 1 forks source link

Error building remote log auxiliary state when the follower broker restarts #29

Closed HenryCaiHaiying closed 2 months ago

HenryCaiHaiying commented 1 year ago

Set up 2 brokers Create a topic with replication factor 2 Produce some data and saw the data is replicated between 2 brokers Bring down follower broker Produce some more data to the topic Bring up the follower broker and saw the below exception:

[2023-06-09 17:20:14,509] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error building remote log auxiliary state for topic2-0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: No resource found for partition: ZCvPWUTbQ8SMOQT33lUqcA:topic2-0
HenryCaiHaiying commented 1 year ago

Complete error message:

[2023-06-09 17:20:14,509] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error building remote log auxiliary state for topic2-0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: No resource found for partition: ZCvPWUTbQ8SMOQT33lUqcA:topic2-0
    at org.apache.kafka.server.log.remote.metadata.storage.RemotePartitionMetadataStore.getRemoteLogMetadataCache(RemotePartitionMetadataStore.java:152)
    at org.apache.kafka.server.log.remote.metadata.storage.RemotePartitionMetadataStore.remoteLogSegmentMetadata(RemotePartitionMetadataStore.java:164)
    at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.remoteLogSegmentMetadata(TopicBasedRemoteLogMetadataManager.java:211)
    at kafka.log.remote.RemoteLogManager.fetchRemoteLogSegmentMetadata(RemoteLogManager.scala:790)
    at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$2(ReplicaFetcherThread.scala:192)
    at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$2$adapted(ReplicaFetcherThread.scala:188)
    at scala.Option.foreach(Option.scala:437)
    at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$1(ReplicaFetcherThread.scala:188)
    at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$1$adapted(ReplicaFetcherThread.scala:186)
    at scala.Option.foreach(Option.scala:437)
    at kafka.server.ReplicaFetcherThread.buildRemoteLogAuxState(ReplicaFetcherThread.scala:186)
    at kafka.server.AbstractFetcherThread.$anonfun$fetchOffsetAndBuildRemoteLogAuxState$2(AbstractFetcherThread.scala:734)
    at kafka.server.AbstractFetcherThread.fetchOffsetAndApplyFun(AbstractFetcherThread.scala:707)
    at kafka.server.AbstractFetcherThread.fetchOffsetAndBuildRemoteLogAuxState(AbstractFetcherThread.scala:733)
    at kafka.server.AbstractFetcherThread.handleOffsetMovedToTieredStorage(AbstractFetcherThread.scala:748)
    at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:393)
    at scala.Option.foreach(Option.scala:437)
    at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:329)
    at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:328)
    at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
    at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:359)
    at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:355)
    at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:328)
    at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:128)
    at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:127)
    at scala.Option.foreach(Option.scala:437)
    at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:127)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:108)
HenryCaiHaiying commented 1 year ago

This is also on the handling path of LI_OFFSET_MOVED_TO_TIERED_STORAGE in AbstractFetcherThread.scala:

                case Errors.LI_OFFSET_MOVED_TO_TIERED_STORAGE =>
                  // no need to retry this as it indicates that the requested offset is moved to tiered storage.
                  if (handleOffsetMovedToTieredStorage(topicPartition, currentFetchState,
                    fetchPartitionData.currentLeaderEpoch, partitionData.logStartOffset()))
HenryCaiHaiying commented 1 year ago

This might be a race condition, looks like metadata cache is populated when TopicBasedRemoteMetadataManager.onPartitionLeadershipChanges is called.

HenryCaiHaiying commented 1 year ago

This problem seems went away after I did this setting to shorten the initialization cycle:

rlmm.config.remote.log.metadata.initialization.retry.interval.ms=500