dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
144 stars 34 forks source link

controller stuck when rollout fails #138

Open Jean-Daniel opened 11 months ago

Jean-Daniel commented 11 months ago

When pushing a change on a DragonFly resource, a rollout, if an other update is pushed, the controller will wait until first rollout is done before applying any change.

This is an issue, as if the first change contains a typo (invalid image url for instance), there is no way to fix it, as the change with the right image url will never be applied.

To reproduce:

The controller wait for the first change to be fully applied but it never occurs as the pod are failing to start with ImagePullError.

Pothulapati commented 11 months ago

This is a good find! The fix would be to give up on the roll out if the first deleted pod isn't coming back and then accept more updates! @Jean-Daniel Do you want to take it up?

parera10 commented 9 months ago

I've just found this situation modifying dragonfly resource with not enough memory. Operator it's not able to rollback nor interrupt the current rollout with a new one applying valid settings.