faust-streaming / faust

Python Stream Processing. A Faust fork
https://faust-streaming.github.io/faust/
Other
1.66k stars 183 forks source link

Cannot recover when any changelog topic partition becomes empty (as a result of some retention policy) #597

Open cristianmatache opened 10 months ago

cristianmatache commented 10 months ago

Checklist

Steps to reproduce

A changelog topic can become empty as a result of a Kafka cleanup policy (i.e., time/sized-based retention) The case when the topic is empty is not handled properly in Faust recovery.

The recovery service needs to replay messages between low watermark (earliest offset) to high watermark - 1 (latest offset). Faust does this for the active and the standby partitions. Afterwards, it runs some consistency checks.

Active partitions

Let's start with the active partitions:

Standby partitions

Moreover, recovering standby partitions has a separate issue in the consistency checks. First, let's see what is the sequence of steps for active partitions such that we can draw a parallel.

Active:

Standby:

The problem is that after seeking the offsets may be updated asynchronously so by the time the consistency checks run they may no longer hold.