Open Jean-Daniel opened 11 months ago
This is a good find! The fix would be to give up on the roll out if the first deleted pod isn't coming back and then accept more updates! @Jean-Daniel Do you want to take it up?
I've just found this situation modifying dragonfly resource with not enough memory. Operator it's not able to rollback nor interrupt the current rollout with a new one applying valid settings.
When pushing a change on a DragonFly resource, a rollout, if an other update is pushed, the controller will wait until first rollout is done before applying any change.
This is an issue, as if the first change contains a typo (invalid image url for instance), there is no way to fix it, as the change with the right image url will never be applied.
To reproduce:
The controller wait for the first change to be fully applied but it never occurs as the pod are failing to start with ImagePullError.