dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
132 stars 28 forks source link

Redis Sentinel like Client Expereince #149

Closed sav116 closed 7 months ago

sav116 commented 8 months ago

Hello, I'm very interested in your product! I'm new to dragonflydb, but I have a need to make a high availability installation of dragonflydb on kubernetes cluster.

I installed the operator: kubectl apply -f https://raw.githubusercontent.com/dragonflydb/dragonfly-operator/main/manifests/dragonfly-operator.yaml

And installed dragonflydb:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  namespace: dragonfly
  labels:
    app.kubernetes.io/name: dragonfly
    app.kubernetes.io/instance: dragonfly-sample
    app.kubernetes.io/part-of: dragonfly-operator
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/created-by: dragonfly-operator
  name: dragonfly-sample
spec:
  image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.14.1
  args:
  - "--cluster_mode=emulated"
  replicas: 3
  resources:
    requests:
      cpu: 2
      memory: 4Gi
    limits:
      cpu: 4
      memory: 8Gi
  serviceSpec:
    type: LoadBalancer

But when I check the installation by killing the master:

kubectl -n dragonfly get pods -l role=master
NAME                 READY   STATUS    RESTARTS   AGE
dragonfly-sample-1   1/1     Running   0          5m15s
kubectl -n dragonfly delete pod dragonfly-sample-1
pod "dragonfly-sample-1" deleted

After this I get an error on the client: redis.exceptions.ConnectionError: Error 61 connecting to dragonfly-sample:6379. Connection refused

My python verification code looks like this:

import redis

import random
import string

r = redis.Redis(host='dragonfly-sample', port=6379, db=15, decode_responses=True)

def get_random_string(length):
    return ''.join(random.choice(string.ascii_letters) for i in range(length))

for i in range(0,100_000_000):
    key = get_random_string(5)
    r.set(key, get_random_string(30))

Dependency versions: Kubernetes: v1.28.5 Python redis-client: 5.0.1

Please help me with the problem, maybe my installation is not correct? The goal is to get rid of the error and increase reliability

Pothulapati commented 8 months ago

@sav116 This is weird. Does the connection work without deleting the master initially from the python application?

sav116 commented 8 months ago

@Pothulapati, yes, it works great.

Pothulapati commented 8 months ago

@sav116 When a master is deleted, It could take a while for Kubernetes to then route the service to the new master. Is this persistent? We have some tests, where the connection should work with some transient errorrs during the rollout. 🤔

sav116 commented 8 months ago

@Pothulapati , yes, this is persistent. Most likely the connection is broken by the Kubernetes service while for Kubernetes to then route the service to the new master with a slight delay, but it seems that this should be taken into account in Dragonfly installations in several replicas.

Pothulapati commented 8 months ago

We have always expected slight delay and also lock the old master before a rollout is done, so that there is no new data being added. (There are still errors but atleast data corruption isn't htere)

but it seems that this should be taken into account in Dragonfly installations in several replicas.

Any ideas on what we should do here?

sav116 commented 8 months ago

@Pothulapati, for example, add native support for redis santinel to the dragonfly operator.

Pothulapati commented 8 months ago

@sav116 Dragonly already works with Redis Sentinel but to implement Redis Sentinel inside Dragonfly Operator would be a bigger understaking.

With the Dragonfly Operator, We tried to do as Kubernetes way of doing it as possible (through Services). Errors during the rollout during Transition is expected and is common across any kind of databases/applications on Kubernetes and we set the same expectation.

We don't yet have any plans of building a sentinel like thing inside the Dragonfly Operator or integrating with it yet as :/

sav116 commented 8 months ago

@Pothulapati, I see, thank you very much for the clarity and openness.

Pothulapati commented 8 months ago

I will still keep the issue open, and update the title so that if any more users want to it, we can have the discussion here!

sav116 commented 7 months ago

@Pothulapati, I solved this problem on the client side by using the python module redis.ConnectionPool. Now the code looks like this:

import redis
from redis import RedisError

import random
import string

redis_pool = redis.ConnectionPool(host='dragonfly-sample', port=6379,
                                  socket_timeout=5,
                                  socket_connect_timeout=5,
                                  retry_on_timeout=True,
                                  health_check_interval=30)

r = redis.Redis(connection_pool=redis_pool, db=15, decode_responses=True)

def get_random_string(length):
    return ''.join(random.choice(string.ascii_letters) for i in range(length))

for i in range(0,100_000_000):
    try:
        key = get_random_string(5)
        r.set(key, get_random_string(30))
    except RedisError as e:
        print("Redis error occurred:", e)

Sentinel is not needed here and the issue can be closed =)