Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.29k stars 570 forks source link

Guidance on replication of data between 2 keydb clusters deployed in kubernetes #643

Open viju2008 opened 1 year ago

viju2008 commented 1 year ago

Dear Sir

Need to do Replication between 2 keydb clusters deployed in different Kubernetes clusters in async mode

We have a 5 pod keydb cluster in k8scluster1 and 5 pod keydb cluster in k8scluster2.. Both the keydb clusters are in kubernetes .

how to replicate data between the clusters specific to kubrnetes deployments

Any links or documentation available ?

frankfil commented 1 year ago

I've not seen any offical docs on this but I have a similar setup I manage for one of my clients, with two K8S clusters each running the Helm chart of KeyDB in active-active replication.

What I did was to first create an additional Service in each K8S cluster that opened access to the keydb-0 Pod of the StatefulSet in each cluster (via a NodePort & external HAProxy servers as in my case it is bare metal K8S clusters and a private WAN - do whatever you need for opening access and protecting these via firewalls/etc).

Then I created a custom container image from the KeyDB one and added a simple bash script that just called the keydb-cli REPLICAOF command pointing to the other cluster's public access address and port.

Something like this:

#!/bin/bash

err_message() {
  echo "Caught SIGINT or SIGTERM - exiting"
  exit
}

trap err_message SIGINT SIGTERM

KEYDB_SYNC_KEYDB_HOST=${KEYDB_SYNC_KEYDB_HOST:-}
KEYDB_SYNC_KEYDB_PORT=${KEYDB_SYNC_KEYDB_PORT:-}
KEYDB_SYNC_REMOTE_KEYDB_HOST=${KEYDB_SYNC_REMOTE_KEYDB_HOST:-}
KEYDB_SYNC_REMOTE_KEYDB_PORT=${KEYDB_SYNC_REMOTE_KEYDB_PORT:-}

if [[ -z "$KEYDB_SYNC_REMOTE_KEYDB_HOST" || -z "$KEYDB_SYNC_REMOTE_KEYDB_PORT" || -z "$KEYDB_SYNC_REMOTE_KEYDB_HOST" || -z "$KEYDB_SYNC_REMOTE_KEYDB_PORT" ]]; then
  echo "Required ENV VARs not set"
  exit 1
fi

echo "Calling $KEYDB_SYNC_KEYDB_HOST:$KEYDB_SYNC_KEYDB_PORT to issue command REPLICAOF $KEYDB_SYNC_REMOTE_KEYDB_HOST $KEYDB_SYNC_REMOTE_KEYDB_PORT"
/usr/local/bin/keydb-cli -h "$KEYDB_SYNC_KEYDB_HOST" -p "$KEYDB_SYNC_KEYDB_PORT" REPLICAOF "$KEYDB_SYNC_REMOTE_KEYDB_HOST" "$KEYDB_SYNC_REMOTE_KEYDB_PORT"

Then I created a K8S ConJob that runs this script every 5 mins, setting the ENV vars via the CronJob Pod Speclike this:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: keydb-sync
  namespace: keydb
spec:
  # Run every 5 minutes
  schedule: "*/5 * * * *"
  startingDeadlineSeconds: 30
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1

  jobTemplate:
    spec:
      backoffLimit: 0
      ttlSecondsAfterFinished: 3600
      template:
        spec:
          restartPolicy: Never

          containers:
          - name: keydb-sync
            image: custom/keydb-sync:1.0
            env:
            - name: KEYDB_SYNC_KEYDB_HOST
              value: "keydb-0.keydb-headless"
            - name: KEYDB_SYNC_KEYDB_PORT
              value: "6379"
            - name: KEYDB_SYNC_REMOTE_KEYDB_HOST
              value: "keydb.remote.k8s.cluster"
            - name: KEYDB_SYNC_REMOTE_KEYDB_PORT
              value: "6379"

            resources:
              requests:
                cpu: "10m"
                memory: "10M"
              limits:
                cpu: "100m"
                memory: "100M"

I do plan to do something less "clunky" than this by making a change to the Helm chart to support adding command line options to individual Pods to eliminate the need to use a CronJob but as this has actually worked without issue it's fallen down the list of things to do (but I really should get around to doing that).

Hope this helps!