Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
85 stars 18 forks source link

Thinking about disaster recovery #534

Open bingkunyangvungle opened 3 months ago

bingkunyangvungle commented 3 months ago

What is currently missing?

Since we have all the data needed in S3 already (including the log data and the metadata), it can be of great help if we can just recover from the shutdown of the cluster, and launch the new cluster to use the same topic folder(prefix_randomstring) in S3. For the new cluster, we can consume from the oldest offset from the old topic folder. In this way, we can fully achieve the disaster recovery.

How could this be improved?

Some suggestions: 1.Enable the topic can be created with designated folder (with random string) in S3 in the new cluster; 2.The new message for the topic can be stored in the older folder

Is this a feature you would work on yourself?

I haven't take a look for the actual code part, but if someone can point me to the proper code link, I'd be happy to start looking.

ivanyu commented 3 months ago

Hi @bingkunyangvungle. Such restoration is certainly possible. Do you see particular changes need to be done to this RSM plugin?

ivanyu commented 3 months ago

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

bingkunyangvungle commented 3 months ago

This is the tricky part that I think it might be. One proposal that I can think of is to have an internal mapping between the kafka-created topic ID <------> the plugin-managed topic ID. Then the plugin will manage the remote storage folder with the plugin-managed topic ID. Also user can provide/configure the plugin-managed topic ID for topic provisioning. Of course the plugin would also need to check if the remote storage folder exist or not.

Just toss some idea here to share.

funky-eyes commented 1 month ago

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

Perhaps we can ignore the topicid inObjectKeyFactory#mainPath? Adding some configuration to achieve this purpose.

ivanyu commented 1 month ago

This probably also doesn't have to do to the plugin. The plugin is a pretty passive component and does only a limited number of operations by the broker request. I think this should be a separate tool and/or certain broker modifications to support restoration like this in the first place.

We're exploring this idea in Aiven, but for the reason mentioned above, I don't think the plugin will undergo much change in the course of this.