cloudfoundry / korifi

Cloud Foundry on Kubernetes
Apache License 2.0
317 stars 65 forks source link

As a Korifi user I do not want my application to restart when I bind/unbind a service instance to it #3087

Open georgethebeatle opened 9 months ago

georgethebeatle commented 9 months ago

Background

In CF for vms when a user binds a service instance to an app the app does not immediately see the new service binding but needs to be restarted before the updated value of the VCAP_SERVICES env var is available to it. This is normal and expected as the process env is immutable. The cf cli outputs a tip with regards to this behaviour.

Binding service instance s to app dorifi in org org1 / space space2 as cf-admin...
TIP: Use 'cf restage dorifi' to ensure your env variable changes take effect

In Korifi binding a service to an app means two things:

What the servicebinding object does is that it injects new etries in the volumes and volumeMounts fields of the Pod or the PodTemplateSpec which effectively restarts the pod. This results in Korifi behaving slightly differently to classic CF by doing unexpected app restarts which might mean downtime for the users of the app. Another inconvenience is that if the user wants to bind n service instances to the app the app will be restarted n times.

Reproduction steps

❯ cf bind-service dorifi s
Binding service instance s to app dorifi in org org1 / space space2 as cf-admin...
OK

TIP: Use 'cf restage dorifi' to ensure your env variable changes take effect

❯ cf apps
Getting apps in org org1 / space space2 as cf-admin...

name     requested state   processes   routes
dorifi   started           web:0/1     dorifi.apps-127-0-0-1.nip.io

Acceptance

GIVEN I have deployed an app a to Korifi and have created a service instance si WHEN I run cf bind-service a si THEN I can continue using the app without experiencing any downtime

danail-branekov commented 9 months ago

For context, app restart does not cause downtime in cases when the app is scaled to more than one instance.

When the statefulset gets updated by servicebinding.io, the statefulset controller restart its pods one by one (similarly to what we do on rolling restart)

georgethebeatle commented 9 months ago

We discussed this issue on the last working group sync and decided that it is not too critical, because

Given all of the above we think that it would be better to document the fact that Korifi would restart workloads when one binds services to them in the section dealing with differences from CF.

georgethebeatle commented 9 months ago

Given that we decided to accept the fact that binding/unbinding an app to a service instance would automatically restart the app, should we also restart the app when the service instance credentials change? This way we could simply document that korifi automatically restarts the apps whenever service details chages so that the app can immeditely see the change. Wouldn't such a behaviour be more consistent than auto restarting on bind/unbind but requiring a manual restart on credentials chage.

We spiked a bit on this idea and push the relevant changes here

ggerla commented 8 months ago

the CF binding is composed by 2 phases:

  1. VCAP_SERVICES env var
  2. restage/restart

So my proposal here is

  1. on the cf bind-service command generate VCAP_SERVICES env vars and store them in a secret/configmap
  2. on the cf restage command read vars from secret/configmap, prepare and deploy servicebinding.servicebinding.io object that will automatically restart the app.

In this way the restart in not done on the binding (not expected) but at the restage/restart time (expected).

danail-branekov commented 8 months ago

In order to achieve the behaviour you suggest we would have to change the workload of the servicebinding.io binding. Currently, the binding workload is the statefulset. This results into servicebinding.io updating the statefulset pod template via its webhook which immediately triggers the statefulset controller to restart the pods.

We had an idea to change the binding workload to another resource (e.g. the AppWorkload), however: a) we need to spike on whether such an approach works at all b) have to ensure that existing bindings still work after Korifi upgrade c) if possible, avoid pod restarts during upgrade

Once/If we prove that such an approach is feasible, then we could change the appworkload (or whatever target resource we choose) controller to update the statefulset pod template only on restart/restage.

Bonus point of such an approach would be that Korifi makes no assumptions that workloads are backed up by statefulsets. This assumption currently is an issue if someone wants to bring their own workload runners (for example, a workload runner that uses deployments instead of statefulsets)