bookingcom / shipper

Kubernetes native multi-cluster canary or blue-green rollouts using Helm
Apache License 2.0
734 stars 39 forks source link

Release StrategyExecutor aborts if successor release is progressing #282

Closed osdrv closed 4 years ago

osdrv commented 4 years ago

This release is a fix of a problem with the new implementation of the reconciliation loop involving strategy execution for all release (as opposed to apply it to contender-incumbent pair only in the old implementation). The problem we observed boiled down to the next scenario:

  1. A contender runs through the strategy execution loop and bails out with an incomplete state.
  2. Incumbent enters it's independent strategy loop and checks it's desired state from the desired state of the contender. This is the point where the things broke: desired state != achieved state. At this moment incumbent can start progressing without waiting for it's successor (contender) to complete.

This optimistic triggering was not a planned action and was mainly mistakenly introduced because distinct release generations are being processed independently now.

Preventing the strategy loop from progressing is one way to get around this problem where releases are being processed independently. Another alternative to that could be: an incumbent release can be smarter in terms of figuring out it's desired state itself (e.g. if it's successor hasn't achieved it's target step yet, try to reconcile on the last step in it's own strategy). This behavior has pros and cons of course. On the bright side: releases can behave much more independently and this can also challenge the need for incumbent and contender strategy states. On the lowlight, this might provoke some unwanted chatting and flapping in the release squad. Chatty releases might be quite expensive in terms of converging to a stable state (as we now provoke a direct and reverse chain reaction when there is a change to a release object).

For now, we settle down with an easy patch which prevents the system from doing unwanted things. This approach might be a subject for change in the future.

Signed-off-by: Oleg Sidorov oleg.sidorov@booking.com