As a Korifi user I do not want my application to restart when I bind/unbind a service instance to it

georgethebeatle commented 9 months ago

Background

In CF for vms when a user binds a service instance to an app the app does not immediately see the new service binding but needs to be restarted before the updated value of the VCAP_SERVICES env var is available to it. This is normal and expected as the process env is immutable. The cf cli outputs a tip with regards to this behaviour.

Binding service instance s to app dorifi in org org1 / space space2 as cf-admin...
TIP: Use 'cf restage dorifi' to ensure your env variable changes take effect

In Korifi binding a service to an app means two things:

Update the value of the VCAP_SERVICES env var (same as CF for VMs)
Create a servicebinding.servicebinding.io object that mounts the datails of the service instance we are binding to as a mountpoint in the resulting pod where the application is running. This is a new pure kubernetes way of consuming the service instance and is specified in the servicebinding.io spec

What the servicebinding object does is that it injects new etries in the volumes and volumeMounts fields of the Pod or the PodTemplateSpec which effectively restarts the pod. This results in Korifi behaving slightly differently to classic CF by doing unexpected app restarts which might mean downtime for the users of the app. Another inconvenience is that if the user wants to bind n service instances to the app the app will be restarted n times.

Reproduction steps

❯ cf bind-service dorifi s
Binding service instance s to app dorifi in org org1 / space space2 as cf-admin...
OK

TIP: Use 'cf restage dorifi' to ensure your env variable changes take effect

❯ cf apps
Getting apps in org org1 / space space2 as cf-admin...

name     requested state   processes   routes
dorifi   started           web:0/1     dorifi.apps-127-0-0-1.nip.io

Acceptance

GIVEN I have deployed an app a to Korifi and have created a service instance si WHEN I run cf bind-service a si THEN I can continue using the app without experiencing any downtime

danail-branekov commented 9 months ago

For context, app restart does not cause downtime in cases when the app is scaled to more than one instance.

When the statefulset gets updated by servicebinding.io, the statefulset controller restart its pods one by one (similarly to what we do on rolling restart)

georgethebeatle commented 9 months ago

We discussed this issue on the last working group sync and decided that it is not too critical, because

As explained in the previous comment the restart is done in a highly available manner, restarting instances one by one, meaning that no downtime will be incurred for apps running more than one instance.
Binding services is usually done as part of the initial setup of the application and it is unlikely that there is real usage.
In order to avoid multiple restarts when binding multiple services one could make use of app manifests. This is likely to already be the case for apps that use multiple services.
Despite tha fact that restarting the app on service bind is a change in behaviour between traditional CF and Korifi it is not too critical, because you have to restart your app to make it see the new binding anyway.
We spiked on making Korifi behave exactly like traditional CF, but it turned out that there is no reliable way to implement this, because of the declarative nature of Kubernetes. It is very hard for our controllers to find out if the app is being started and doing so would go against the best practices of writing controllers.

Given all of the above we think that it would be better to document the fact that Korifi would restart workloads when one binds services to them in the section dealing with differences from CF.

georgethebeatle commented 9 months ago

Given that we decided to accept the fact that binding/unbinding an app to a service instance would automatically restart the app, should we also restart the app when the service instance credentials change? This way we could simply document that korifi automatically restarts the apps whenever service details chages so that the app can immeditely see the change. Wouldn't such a behaviour be more consistent than auto restarting on bind/unbind but requiring a manual restart on credentials chage.

We spiked a bit on this idea and push the relevant changes here

ggerla commented 8 months ago

the CF binding is composed by 2 phases:

VCAP_SERVICES env var
restage/restart

So my proposal here is

on the cf bind-service command generate VCAP_SERVICES env vars and store them in a secret/configmap
on the cf restage command read vars from secret/configmap, prepare and deploy servicebinding.servicebinding.io object that will automatically restart the app.

In this way the restart in not done on the binding (not expected) but at the restage/restart time (expected).

danail-branekov commented 8 months ago

In order to achieve the behaviour you suggest we would have to change the workload of the servicebinding.io binding. Currently, the binding workload is the statefulset. This results into servicebinding.io updating the statefulset pod template via its webhook which immediately triggers the statefulset controller to restart the pods.

We had an idea to change the binding workload to another resource (e.g. the AppWorkload), however: a) we need to spike on whether such an approach works at all b) have to ensure that existing bindings still work after Korifi upgrade c) if possible, avoid pod restarts during upgrade

Once/If we prove that such an approach is feasible, then we could change the appworkload (or whatever target resource we choose) controller to update the statefulset pod template only on restart/restage.

Bonus point of such an approach would be that Korifi makes no assumptions that workloads are backed up by statefulsets. This assumption currently is an issue if someone wants to bring their own workload runners (for example, a workload runner that uses deployments instead of statefulsets)

cloudfoundry / korifi