Closed cschockaert closed 1 year ago
Seems that the cluster trigger deployment is updated to the new node pool, but the cluster statefulset is still on the old node pool, and if one sts pod need to restart it can go to a node that is unknown in the bulletin board... keeping state in a infinite loop situation..
Not sure if it's the cluster trigger pod that is generating the bulletin board or the operator, but if we use a node selector i think we can go in a situation where not all VM GKE nodes are added to the bulletin .. causing big troubles
We reviewed this internally and this seems that you were doing a change of node pool as a one-time change for dev/test env. We recommend to approach this with a new setup as hot-change of node pool is not a validated scenario. There are other approaches should you need to migrate a production system later. We can clause this one @laurentdroin - kindly reach out to Redis teams as required.
Yep, thanks, i will never do that in production, since it's not working in dev :)
Hello
one of my cluster node pod is in crash coz of:
time="2022-07-07T13:02:48Z" level=error msg="could not find node by name in bulletinboard: gke-development-cluste-main-n1std8-v1-3059a148-mahz"
i think it's because i changed the node selector and tolerator of the REC. so the buletin board dont add the previous existing node pool (old pod are on the previous nodepool) which are not in the new bulletin board.