TrivadisPF / kafka-backup

Kafka Backup to S3
4 stars 4 forks source link

Implement offset translation index to map from original offset to "recovered" offset #4

Open gschmutz opened 5 years ago

gschmutz commented 5 years ago

On Restore, an offset mapping index should be provided (We might store the offset of the original message (by topic, partition) with the backup. When restoring the data, a new offset will be created which can be mapped to the original offset). This allows a consumer to move to the right offset by using the offset and after the restore continue with the last committed offset.

The index can get very big (we have to store one entry per message), but only a few of the index entries will be needed, but accessing an index by topic/partition/old-offset should be as quick as possible. That's why a database for storing the index will be necessary. This database should have a low footprint and should be easily deployable/usable, either as a container or "In-process" with the restore.

So far we can think of either using

antonioiorfino commented 5 years ago

I have built an utility in order to reset the consumer offsets after the restore operation. It is possible to restore the offsets in 3 different ways: 1) Restore manually the offsets (map partitions:offset). 2) Restore the offset getting the old offset from the broker. In this case we assume to continue to have the broker up&runnig. (Deletion wrong topic) 3) Restore the offset using the consumer group offsets store on S3. This mechanism assume do you have backedup the data of __consumer_offsets topic.

The association between old and new offset is stored inside the record restored itself. The old offset is stored as metadata "x-old-offset" of the record.