Closed Pothulapati closed 1 year ago
A couple of hopeful helpful notes:
is this exposed somehow with any command?
Not that I know one. There is WAIT
in Redis, but its for the all replicas.
In replication tests, we just send write traffic as long as we get an 'instance is in loading state' error back. But we shouldn't do that with the operator, we can invent a new command for this.
suld make sure replication is ready before it moves to the next pod.
What's crucial about it? DF support multiple full sync stages in parallel.
@dranikpg
What's crucial about it? DF support multiple full sync stages in parallel. In a rolling update we want to avoid getting to updating the master too quickly before replicas are in a some sensible state. Makes sense?
To give the context from Kubernetes side, As we create a underlying statefulset for the Dragonfly instance We follow the same method that is used whenever a statefulset is updated i.e The update starts with the highest numbered pod and goes until the last one. There is no way for the statefulset to know which pod is master and do something different about it.
What I would propose is, As rolling out a pod is essentially terminating the current one, and creating a new one (with latest configuration) in its place, We should make sure a Dragonfly pod termination happens gracefully i.e If its a master, It should make sure Replicas are upto date before it can terminate fully. We could also increase the time period here to allow for it to complete fully.
@ashotland Are there any reasons why you think we should start from Replicas and then move to master? I'm not yet sure if we can choose the order of the rollout with Kubernetes. If we start with the master, The lifecycle controller recognises that the current master is going down, and selects a new one automatically. So, There should not be any downtime.
i.e If its a master, It should make sure Replicas are upto date before it can terminate fully
Redis has WAIT
for this and shutdown by default waits for replicas to sync - we don't. I honestly don't know how you achieve consistency reliably without it 😅
In a rolling update we want to avoid getting to updating the master too quickly before replicas are in a some sensible state. Makes sense?
It does. Nonetheless making the sensible state less sensible sounds also reliable.
Having master waiting WAITing for replica sync would be great of course.
Meanwhile, I think that achieving stable state is a good enough heuristic as we don't provide read from replica after write consistency anyhow.
It will also simplify new master election as don't need to worry about electing a replica who's full sync was stopped in the middle.
I actually see that the "role" command exposes sync state "connecting"/"full sync"/"stable state"
I actually see that the "role" command exposes sync state "connecting"/"full sync"/"stable state"
Yes, but its equivalent to the send-command-get-loading-error approach in the sense that it doesn't block, so you need to do it in a tight loop.
I think we shouldn't be discussing what the most acceptable approach is out of the available ones - it would be strange to develop the operator around imperfect approaches with all the fixes for corner cases and other issues, if a simple WAIT ALL
or a blocking command for detecting state changes can be implemented fairly quickly.
Two issues are orthogonal.
Given that these two efforts seem orthogonal to me, I think there is a place for operator improvements in this area.
Regarding the control over update, There seems to be a way to do this.
Essentially, When this is set. The statefulset controller (not our DF controller) does not do anything until someone else deletes the pod and it recreates the pod with the newer changes. So, Our Operator can take this part and delete pods one by one starting from replicas, and also making sure the old replica is ready before it goes and deletes the new pod. WDYT? 🤔
I'm now trying this out and checking now to see if there are any downsides with this approach.
Do you intend to use OnDelete for controlling the update order or also for avoiding progress as long as full sync is in progress?
Do you intend to use OnDelete for controlling the update order or also for avoiding progress as long as full sync is in progress?
For both, My reasoning: The readiness part that we discussed where we make sure pod isn't ready (by updating healthchecks) until it is stable sync has a problem i.e until the pod lifecycle controller reacts and marks it as a pod as replica/master, The pod can't definitely know what's its state is. And failing readiness checks has other implications by Kubernetes around restarts. So, Tying readiness of a pod with replication doesn't feel right.
Now that we have a better option i.e OnDelete, My guess is we will have more control on this and this will also allow us in achieving last mile data once we have a primitive in DF to check. WDYT?
Yeah, sounds like a good direction to explore.
Rollout should make sure replication is ready before it moves to the next pod. Should start from master.