Open mikekuzak opened 2 years ago
I think a good solution would be to use before/after reboot annotations to simply wait some time before proceeding with node reboot. Perhaps looking at #37 will give you an idea on which stage you want to produce the annotations.
I've also created #168 to make it more obvious how to implement some custom rebooting logic, as I don't think existing examples are good enough.
Let me know if you're able to implement it yourself. If not, I'll help you out.
#
Alternatively we could expose ReconciliationPeriod
parameter in operator, which could be increased from default 30 seconds to let's say 10 minutes, so nodes reboot roughly every 20 minutes then (See #75) https://github.com/flatcar-linux/flatcar-linux-update-operator/blob/53f08043e320c853940ed7b4c126c7b72af1af00/pkg/operator/operator.go#L98
However for this, right now operator CLI has no tests so those should be added first and also, ideally operator will change it's operating model to be event-based (#143 ), so such delay won't be easy to implement anymore.
Alternatively we could expose
ReconciliationPeriod
parameter in operator, which could be increased from default 30 seconds to let's say 10 minutes, so nodes reboot roughly every 20 minutes then (See #75)
This would be a great feature to expose. How hard would this be to implement?
How hard would this be to implement?
Dead simple to implement right now, but hard to maintain, this is why I suggested using hooks instead.
Anyone found a way to set some kind of simple delay between reboots? We are hitting this problem on every upgrade, and an easy workaround would be to add 10m extra delay.
Current situation
Hi,
We have a K8ssandra cluster running on our K8s cluster. Flatcar reboots quite quickly but some applications might take longer to startup and initialize. The Flatcar operator doesn't know anything about running apps as it's not designed to do this.
Impact
Applications might lose quorum when a K8s Cluster running on flatcar bounces the nodes to fast.
Ideal future situation
It would be good to have some sort of mechanize which would prevent a reboot too fast, even if flatcar is already up.
Implementation options
Maybe there a way to add a simple time based solution. Add a delay of 10min before the next eligible node reboots.
Thanks