aiven / kafka

Mirror of Apache Kafka
Apache License 2.0
2 stars 1 forks source link

Consumption from Tiered Storage is broken #15

Closed jeqo closed 1 year ago

jeqo commented 1 year ago

Testing 3.3-2022-10-06-tiered-storage branch, consumption seem to be broken compared to 3.0-2022-03-31-tiered-storage.

Test harness cases are failing when trying to consume, e.g. DeleteTopicWithSecondaryStorageTest:

org.opentest4j.AssertionFailedError: Could not consume 3 records of topicA-1 from offset 0 in 60000 ms. 0 message(s) consumed:

    at app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39)
    at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
    at app//kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:1061)
    at app//kafka.tiered.storage.TieredStorageTestContext.consume(TieredStorageTestContext.scala:151)
    at app//kafka.tiered.storage.ProduceAction.doExecute(TieredStorageTestSpec.scala:282)
    at app//kafka.tiered.storage.TieredStorageTestAction.execute(TieredStorageTestSpec.scala:110)
    at app//kafka.tiered.storage.TieredStorageTestAction.execute$(TieredStorageTestSpec.scala:108)
    at app//kafka.tiered.storage.ProduceAction.execute(TieredStorageTestSpec.scala:216)

Haven't dived into the details on what may be causing this issue, but adding it here to keep track.

mdedetrich commented 1 year ago

Many thanks to @ivanyu , he managed to find the core issue. Due to having to integrate changes a lot of places moved from TopicPartition to TopicIdPartition. I happened to miss a line where I didn't adjust an equality check, i.e. https://github.com/aiven/kafka/blob/db59be39d489db8cdf5f97f3e667c3d52662e3cd/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L94 . I applied the fix and force pushed to the branch, i.e. https://github.com/aiven/kafka/blob/3.3-2022-10-06-tiered-storage/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L94 and I can now confirm that the various Fetcher tests are now passing.

@jeqo Can you confirm that it also passed on your end? You still might need to do the Thread.sleep workaround in RemoteLogManager.onEndpointCreated. Once you confirm this I will close the ticket.

mdedetrich commented 1 year ago

Closing this as I believe its fixed, re-open if its not the case.

jeqo commented 1 year ago

@mdedetrich sorry for the late reply. Yes, I can confirm that fetching on 3.3 is working for me. Thank you!