Closed HenryCaiHaiying closed 2 months ago
Complete error message:
[2023-06-09 17:20:14,509] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error building remote log auxiliary state for topic2-0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: No resource found for partition: ZCvPWUTbQ8SMOQT33lUqcA:topic2-0
at org.apache.kafka.server.log.remote.metadata.storage.RemotePartitionMetadataStore.getRemoteLogMetadataCache(RemotePartitionMetadataStore.java:152)
at org.apache.kafka.server.log.remote.metadata.storage.RemotePartitionMetadataStore.remoteLogSegmentMetadata(RemotePartitionMetadataStore.java:164)
at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.remoteLogSegmentMetadata(TopicBasedRemoteLogMetadataManager.java:211)
at kafka.log.remote.RemoteLogManager.fetchRemoteLogSegmentMetadata(RemoteLogManager.scala:790)
at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$2(ReplicaFetcherThread.scala:192)
at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$2$adapted(ReplicaFetcherThread.scala:188)
at scala.Option.foreach(Option.scala:437)
at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$1(ReplicaFetcherThread.scala:188)
at kafka.server.ReplicaFetcherThread.$anonfun$buildRemoteLogAuxState$1$adapted(ReplicaFetcherThread.scala:186)
at scala.Option.foreach(Option.scala:437)
at kafka.server.ReplicaFetcherThread.buildRemoteLogAuxState(ReplicaFetcherThread.scala:186)
at kafka.server.AbstractFetcherThread.$anonfun$fetchOffsetAndBuildRemoteLogAuxState$2(AbstractFetcherThread.scala:734)
at kafka.server.AbstractFetcherThread.fetchOffsetAndApplyFun(AbstractFetcherThread.scala:707)
at kafka.server.AbstractFetcherThread.fetchOffsetAndBuildRemoteLogAuxState(AbstractFetcherThread.scala:733)
at kafka.server.AbstractFetcherThread.handleOffsetMovedToTieredStorage(AbstractFetcherThread.scala:748)
at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:393)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:329)
at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:328)
at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:359)
at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:355)
at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:328)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:128)
at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:127)
at scala.Option.foreach(Option.scala:437)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:127)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:108)
This is also on the handling path of LI_OFFSET_MOVED_TO_TIERED_STORAGE in AbstractFetcherThread.scala:
case Errors.LI_OFFSET_MOVED_TO_TIERED_STORAGE =>
// no need to retry this as it indicates that the requested offset is moved to tiered storage.
if (handleOffsetMovedToTieredStorage(topicPartition, currentFetchState,
fetchPartitionData.currentLeaderEpoch, partitionData.logStartOffset()))
This might be a race condition, looks like metadata cache is populated when TopicBasedRemoteMetadataManager.onPartitionLeadershipChanges is called.
This problem seems went away after I did this setting to shorten the initialization cycle:
rlmm.config.remote.log.metadata.initialization.retry.interval.ms=500
Set up 2 brokers Create a topic with replication factor 2 Produce some data and saw the data is replicated between 2 brokers Bring down follower broker Produce some more data to the topic Bring up the follower broker and saw the below exception: