canonical / zookeeper-operator

Source for Zookeeper VM Charm
Apache License 2.0
3 stars 7 forks source link

[DPE-5218] Implement restore flow #162

Closed Batalex closed 1 month ago

Batalex commented 1 month ago

This PR implements the flow for restoring a ZooKeeper snapshot.

Other changes

Use cases

This flow can be used to bootstrap a new cluster using seeded data. Upon relating with a new client application, let's say, Kafka, here is what is going to happen:

With a minimal change in the Kafka charm, we can also restore a snapshot on an already related ZK application. The chain of events will be as follows:

Here is the patch for Kafka:

diff --git a/src/events/zookeeper.py b/src/events/zookeeper.py
index 4b4f5a5..5a1adc0 100644
--- a/src/events/zookeeper.py
+++ b/src/events/zookeeper.py
@@ -71,7 +71,7 @@ class ZooKeeperHandler(Object):
             event.defer()
             return

-        if not self.charm.state.cluster.internal_user_credentials and self.model.unit.is_leader():
+        if self.model.unit.is_leader():
             # loading the minimum config needed to authenticate to zookeeper
             self.dependent.config_manager.set_zk_jaas_config()
             self.dependent.config_manager.set_server_properties()
@@ -87,6 +87,12 @@ class ZooKeeperHandler(Object):
             for username, password in internal_user_credentials:
                 self.charm.state.cluster.update({f"{username}-password": password})

+        # Kafka keeps a meta.properties in every log.dir with a unique ClusterID
+        # this ID is provided by ZK, and removing it on relation-changed allows
+        # re-joining to another ZK cluster while restoring.
+        for storage in self.charm.model.storages["data"]:
+            self.charm.workload.exec(["rm", f"{storage.location}/meta.properties"])
+
         # attempt re-start of Kafka for all units on zookeeper-changed
         # avoids relying on deferred events elsewhere that may not exist after cluster init
         if not self.dependent.healthy:

About the restore flow

TODO

Demo

https://github.com/user-attachments/assets/07a5d880-1b18-4668-9eea-fbc18287fb04