fiaas / fiaas-deploy-daemon

fiaas-deploy-daemon is the core component of the FIAAS platform
https://fiaas.github.io/
Apache License 2.0
55 stars 31 forks source link

Avoid triggering redeploy of every application on fiaas-deploy-daemon startup #67

Closed oyvindio closed 4 years ago

oyvindio commented 4 years ago

When fiaas-deploy-daemon starts up, the CRD watcher will trigger updates in the managed (deployment, service, etc.) resources corresponding to each application resource. Usually this will be a no-op since the resource will be "updated" with the same state that it already has, and the control loops within Kubernetes won't actually do anything. However, if a new version of fiaas-deploy-daemon with a slight change in behavior (e.g. if a label is added on a resource) is deployed, it might trigger an actual update of all the managed resources. This means that we can get into situations where a fiaas-deploy-daemon upgrade triggers a rolling upgrade of all applications are triggered at roughly the same time, which can cause several issues that coupled together may affect service availability.

In our setup it would be ideal to replicate the behavior that using the pipeline consumer has, where deployments are only triggered by the deployment orchestrating system, and not have fiaas-deploy-daemon potentially trigger redeploy of already deployed applications when it starts up.

Some solutions for this can be to:

  1. Skip deploying if the there already exists an ApplicationStatus which has fiaas/deployment_id which matches the Application resource. fiaas/deployment_id should be unique for each externally triggered deployment. This should not be too difficult to implement, and has the benefit of still being able to trigger a full deploy of every application to propagate changes by new behavior, by simply removing the latest ApplicationStatus resource for each Application before updating fiaas-deploy-daemon.
  2. Use the watch bookmark API (https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks) to try to only read Application resource updates that we haven't seen before. This isn't a huge change, but it requires that state (the most recent bookmark) is kept somewhere. That complicates things a bit. Additionally this feature is only GA in Kubernetes 1.17, which is more recent than the version we're on, so it won't solve the issue for us right now.
oyvindio commented 4 years ago

@gregjones @xavileon it would be useful to have input on how this works in your case, and on the suggested solutions. Personally I think the first suggested solution looks like the most practical approach

gregjones commented 4 years ago

Last time we did an update of fiaas it did 'disturb' some users who didn't understand why all their pods were restarting. Option 1 seems reasonable to me. But maybe it should re-deploy when the status isn't a success? Or when it's not 'pending' in case we shut down mid-deploy?

mortenlj commented 4 years ago

But maybe it should re-deploy when the status isn't a success? Or when it's not 'pending' in case we shut down mid-deploy?

I was looking at this earlier today, and I came to the same conclusion. If the status object is anything other than SUCCESS, we should probably attempt a re-deploy, just in case it was aborted "mid-flight", or even better, if the reason fdd was restarted was to fix a bug that caused deployments to fail.

Just to mention it, the way to get the old behavior, is to simply delete all status objects before restarting. Then it will re-deploy everything. This can also be used to re-deploy a selection of apps, just delete the relevant status objects before restarting.

xavileon commented 4 years ago

Nothing to add to what @gregjones and @mortenlj added. It seems a reasonable thing to do so 👍

mortenlj commented 4 years ago

I have done some explorative coding on this, I'll start on a proper implementation.

I think and hope this can be done in two days, since I only have two more days as a paid contributor :grin: (I'm leaving FINN at the end of the month).