dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
144 stars 34 forks source link

rollout: wait until replication #48

Closed Pothulapati closed 1 year ago

Pothulapati commented 1 year ago

Rollout should make sure replication is ready before it moves to the next pod. Should start from master.

ashotland commented 1 year ago

A couple of hopeful helpful notes:

dranikpg commented 1 year ago

is this exposed somehow with any command?

Not that I know one. There is WAIT in Redis, but its for the all replicas.

In replication tests, we just send write traffic as long as we get an 'instance is in loading state' error back. But we shouldn't do that with the operator, we can invent a new command for this.

suld make sure replication is ready before it moves to the next pod.

What's crucial about it? DF support multiple full sync stages in parallel.

ashotland commented 1 year ago

@dranikpg

What's crucial about it? DF support multiple full sync stages in parallel. In a rolling update we want to avoid getting to updating the master too quickly before replicas are in a some sensible state. Makes sense?

Pothulapati commented 1 year ago

To give the context from Kubernetes side, As we create a underlying statefulset for the Dragonfly instance We follow the same method that is used whenever a statefulset is updated i.e The update starts with the highest numbered pod and goes until the last one. There is no way for the statefulset to know which pod is master and do something different about it.

What I would propose is, As rolling out a pod is essentially terminating the current one, and creating a new one (with latest configuration) in its place, We should make sure a Dragonfly pod termination happens gracefully i.e If its a master, It should make sure Replicas are upto date before it can terminate fully. We could also increase the time period here to allow for it to complete fully.

@ashotland Are there any reasons why you think we should start from Replicas and then move to master? I'm not yet sure if we can choose the order of the rollout with Kubernetes. If we start with the master, The lifecycle controller recognises that the current master is going down, and selects a new one automatically. So, There should not be any downtime.

dranikpg commented 1 year ago

i.e If its a master, It should make sure Replicas are upto date before it can terminate fully

Redis has WAIT for this and shutdown by default waits for replicas to sync - we don't. I honestly don't know how you achieve consistency reliably without it 😅

In a rolling update we want to avoid getting to updating the master too quickly before replicas are in a some sensible state. Makes sense?

It does. Nonetheless making the sensible state less sensible sounds also reliable.

ashotland commented 1 year ago

Having master waiting WAITing for replica sync would be great of course.

Meanwhile, I think that achieving stable state is a good enough heuristic as we don't provide read from replica after write consistency anyhow.

It will also simplify new master election as don't need to worry about electing a replica who's full sync was stopped in the middle.

I actually see that the "role" command exposes sync state "connecting"/"full sync"/"stable state"

dranikpg commented 1 year ago

I actually see that the "role" command exposes sync state "connecting"/"full sync"/"stable state"

Yes, but its equivalent to the send-command-get-loading-error approach in the sense that it doesn't block, so you need to do it in a tight loop.

I think we shouldn't be discussing what the most acceptable approach is out of the available ones - it would be strange to develop the operator around imperfect approaches with all the fixes for corner cases and other issues, if a simple WAIT ALL or a blocking command for detecting state changes can be implemented fairly quickly.

romange commented 1 year ago

Two issues are orthogonal.

  1. Dragonfly should improve its consistency knowledge and will be able to indicate when a replica is fully in sync (i.e. without losing the "last mile").
  2. The operator should not restart a master when all its replicas are in full-sync state. The polling approach is perfectly fine for any control system in general and for K8s specifically because this is how they are modeled. K8s reconciliation loop does polling. The system should work, of course, without polling anything in a tight loop.
  3. @Pothulapati can you please double verify that there is no control over update order of pods? Seems strange to me. Maybe worth asking in forums? I am sure we are not the first ones that need to update stateful systems with master/slave setups.

Given that these two efforts seem orthogonal to me, I think there is a place for operator improvements in this area.

Pothulapati commented 1 year ago

Regarding the control over update, There seems to be a way to do this.

Essentially, When this is set. The statefulset controller (not our DF controller) does not do anything until someone else deletes the pod and it recreates the pod with the newer changes. So, Our Operator can take this part and delete pods one by one starting from replicas, and also making sure the old replica is ready before it goes and deletes the new pod. WDYT? 🤔

I'm now trying this out and checking now to see if there are any downsides with this approach.

ashotland commented 1 year ago

Do you intend to use OnDelete for controlling the update order or also for avoiding progress as long as full sync is in progress?

Pothulapati commented 1 year ago

Do you intend to use OnDelete for controlling the update order or also for avoiding progress as long as full sync is in progress?

For both, My reasoning: The readiness part that we discussed where we make sure pod isn't ready (by updating healthchecks) until it is stable sync has a problem i.e until the pod lifecycle controller reacts and marks it as a pod as replica/master, The pod can't definitely know what's its state is. And failing readiness checks has other implications by Kubernetes around restarts. So, Tying readiness of a pod with replication doesn't feel right.

Now that we have a better option i.e OnDelete, My guess is we will have more control on this and this will also allow us in achieving last mile data once we have a primitive in DF to check. WDYT?

ashotland commented 1 year ago

Yeah, sounds like a good direction to explore.