TrivadisPF / kafka-backup

Kafka Backup to S3
4 stars 4 forks source link

Backup of Compaced Log Topics #30

Closed gschmutz closed 5 years ago

gschmutz commented 5 years ago

Compacted Log Topics need a bit another way to backup and restore, as the "normal" topics with time- or sized-based retention. The reason for that is that with a compacted log we are interested to always have a full backup of the complete log/topic.

Two possible mechanisms have been discussed

  1. Snapshot and "normal" backup: Constantly perform the same backup as for the "normal" topics. In specified time-intervals, perform a full backup (snapshot). Once the snapshot is taken, the "normal" backup objects in S3 can be removed up to the snapshot. On restore, first restore the snapshot and then restore the rest from the "normal" backup.
  2. Active/Passive Backup: Perform a normal backup from the beginning of the log (active path) and after a specified time-interval, start a second back up (passive path) in parallel, which also starts from the beginning of the log. As soon as the 2nd backup (passive) reaches the first (active), the first one can be stopped (is now passive) and the objects belonging to this backup can be deleted. Now the 2nd backup is the active one and the whole procedure continues.

After some discussion between Andrea, Antonio and Guido, we decided to go forward with the 2nd solution. It has the advantage, that it is more close to the "normal" backup and also the restore is easier. The downside is, that more coordination is needed while doing the backup. A goal is therefore to not have an extra coordinator and let the connector itself perform the coordination.

The design of the 2nd solution is covered here.

gschmutz commented 5 years ago

There are some ongoing discussion of whether we should send the offset/partition together with the ACTIVATE message. Point 4 has been extended and Point 5 added to the documentation to discuss the pros and cons of the two options.

antonioiorfino commented 5 years ago

I propose to change the interval parameter "compacted.log.backup.length.days" in "compacted.log.backup.length.hours". I think could be better to define the interval in hours.

antonioiorfino commented 5 years ago

The compacted topic is managed using a specific connector (ch.tbd.kafka.backuprestore.backup.kafkaconnect.compact.CompactBackupSinkConnector). All information are available on README.md file