SAP / sap-btp-service-operator

SAP BTP service operator enables developers to connect Kubernetes clusters to SAP BTP accounts and to consume SAP BTP services within the clusters by using Kubernetes native tools.
Apache License 2.0
125 stars 52 forks source link

Question: Updated Binding Credentials #66

Closed jkbschmid closed 2 years ago

jkbschmid commented 3 years ago

Hello,

there are some services that require a re-bind after a certain time period to update the service credentials (e.g. for renewing certs). Does the operator support a mechanism (e.g. via an annotation) to trigger such a re-bind?

In theory, it is possible to create a second binding with a different name, but this would require changing the deployment yaml on each update. That's why it would be much more convenient to have an annotation or similar. Thank you!

pavelmaliy commented 3 years ago

Hi, No there is no such mechanism. I am not sure how it should be implemented, you mean that periodically BtpServiceOperator will delete bindings and create new ones with the same name according to annotations? There is no update for binding

jkbschmid commented 3 years ago

Hi there,

BtpServiceOperator will delete bindings and create new ones with the same name according to annotations?

Yes, that's it. Our use case is this: We bind to a service that supplies us with a certificate that is valid up to 1 month. So each month we are forced to "recreate" the binding credentials to obtain a valid certificate. If we delete the binding and create it again, pods will not be able to start after the old secret's deletion until the new secret is created. If we create a new service binding we have to modify the CF manifest to reference the new corresponding secret.

There is no update for binding

You are right, CF does not provide a mechanism to update a binding.

However, if the operator kept a mapping between a ServiceBinding in K8s and the actual binding in the service manage the operator could transparently update the binding, e.g. like so:

pavelmaliy commented 3 years ago

Thanks for the feedback, we'll raise your request to our PO and try to find appropriate solution to cred rotation.

loewenstein commented 3 years ago

Wouldn't it anyway be safer to enforce a Deployment rollout @jkbschmid? If you mount the Secret as directory the container will see the binding credentials updated (which at least was not the case when mounting the Secret as a file), but the application also needs to watch the file system and read the new content.

In contrast, if you simply rollout the Deployment with the new Secret, the newly started application process will pick up the new certs automatically.

jkbschmid commented 3 years ago

@loewenstein I think any kind of Pod restart to make the application re-read binding creds is necessary, no matter if rollout or blue/green. IMO, the question is more: How do I update the actual binding secrets in K8s? I'd like to avoid creating a new binding, because then I'd have to adapt the deployment.yaml where the new binding must be referenced.

loewenstein commented 3 years ago

Ah, ok @jkbschmid. I think I understand the use case better now. I am not sure who could implement this and how. It is something on the edge between service instance/binding management and workload management. I.e. you must not delete the binding as long as any Pod is still using the credentials of the old binding. Unfortunatly, that is knowledge only the owner of the workloads has.

MatthiasSchmalz commented 3 years ago

It is important to only delete old bindings after there is no app instance left which uses its credentials. For certificate based credentials this is not critical, but there can still be services with binding level secrets. Those would become invalid in the moment the binding is dropped and cause errors if it is still in use. Therefore the correct sequence is:

In CF this problem is solved as a side effect of how blue green deployment is done by deploy service:

loewenstein commented 3 years ago

I am wondering if the Kubernetes Service Binding Spec could be part of a solution for this.

The btp-service-operator ServiceBinding would need to conform to a Provisioned Service, providing a reference to the Secret via status.binding. The application workloads and any libraries used would need to find credentials considering the Workload projection definition. I.e. searching for mounted credentials in $SERVICE_BINDING_ROOT either by knowing the binding name (and use $SERVICE_BINDING_ROOT/<binding-name>) or by searching the directory for bindings with specific tags (like some libraries might already do searching the VCAP_SERVICES JSON structure). A service binding controller (there's a reference implementation at https://github.com/servicebinding/service-binding-controller, but we might need to engage/invest into this), watches resources and provides the glue.

The btp-service-operator ServiceBinding could have a configuration for regular and/or on-demand credential rotation. When rotation happens it would create a new OSB binding delivering a new Secret to the cluster and update the status.binding. The service-binding-controller would take the new Secret and mutate the volume and volume mounts in the workload, which would trigger a rollout. The service-binding-controller would watch the rollout and update the Ready condition. The btp-service-operator could watch the Ready condition and delete the old OSB binding, including the Secret.

I am still checking if all my assumptions are correct (see Kubernetes Slack)

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

MatthiasSchmalz commented 3 years ago

I assume this issue is still valid and should not be closed.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

loewenstein commented 2 years ago

I believe this is still relevant.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

loewenstein commented 2 years ago

I believe this is still relevant.

pavelmaliy commented 2 years ago

After consulting with our PO, we'll have BLI for 2A for cred rotation.

loewenstein commented 2 years ago

I wasn't really reviewing that PR, just being curious as to the solution and the constraints involved.

Having placed some thoughts and a little research into it, I am sure there is no easy, straightforward solution anyway.

If I understand your reply correctly, the constraint on workloads (or better on workload owners) is that they must not rotate credentials more than once without marking sure all instances of the workload were restarted and had a chance to pick up the new binding secret.

I suppose this is the best you can get without requiring coordination with the workloads. But it does indeed sound limited in a sense that fully automated rotation with a given schedule are dangerous and can easily cause outages.

pavelmaliy commented 2 years ago

As I see it, you can always delete and create the binding as much as you want, If you prefer automated rotation you need to enable it and provide rotation interval, during that period you need to adjust your workloads (if they're using volumes it will be updated automatically) because in next rotation the old binding will be removed and current binding will become old.

I expect that rotation interval will be months and not minutes or hours so it should be enough.

But it does indeed sound limited in a sense that fully automated rotation with a given schedule are dangerous and can easily cause outages

Isn't it all about? otherwise you can create your own secret, reference workloads to it and copy binding secrets into it using whatever strategy suits you.

jkbschmid commented 2 years ago

Great to hear you are working on this issue. Thanks, @pavelmaliy ! Once you're ready, I'd also love to see a design document or some explanation of how you're planning to implement the change. This way, we can address upcoming questions and potential issues already in the design phase.

pavelmaliy commented 2 years ago

Hi, We'll start working on detailed design on 2A currently I am just playing around, basically there are 2 approaches:

Rotate the binding directly in SM and override the current binding secret

Introduce new CRD RotatableServiceBinding which will maintain 2 or more bindings underneath and maintain his own secret which will be updated with latest binding (Resembles to user-provided-service in CF)

Thoughts?

freegroup commented 2 years ago

we have the very same problem for the binding rotation and build a small operator for the KLACKS Validation App which handles this for us. https://github.tools.sap/kernelservice-validation/core-binding-rotation ( or https://pages.github.tools.sap/kernelservice-validation/documentation/setup/svc.html#./content/090-ias-credential-rotation/README ) Maybe this is an option until the rotation is part of the operator.

BUT - if the operator rotates the binding, you need maybe another operator to do related stuff. e.g. push the cred to the SubscriptionManager...in case you are using the IAS as OSB.

freegroup commented 2 years ago

I see that different approaches are also being discussed here. With the components I mentioned, we follow the approach that we have two valid bindings at a certain time. This is important for us, because at the moment of the invalidation/update of the binding we may establish a connection to the server to which the binding belongs. If we invalidate the binding at this time, the service may reject the connection request. Which happens sometime in our setup (rare).

So there is always a short time when a binding is not valid. In our approach, the obsolete binding is deleted only after all consumers are working with the new binding. Thus here never a connection request runs into a HTTP-401

jkbschmid commented 2 years ago

Thanks for sharing the drafts, @pavelmaliy !

Thoughts?

Regarding option 1): I'm not sure if I understand the solution 100%. Does "direct rotation" mean the "old" binding will be deleted once the new one is there?

For me the question is how the "rotation interval" is defined, i.e. when will the operator delete bindings. I think it would make sense to allow the consumer to control this setting, since different bindings have different expiration dates. For example, some certificates expire in one month, other credentials may expire sooner.

I also think that, in the long run, revocation should be supported. I.e., a consumer should be able to instantly trigger a binding rotation.

Adding these features to the existing CRD would make it too complex, that's why I'd prefer option 2 with a separate CRD.

pavelmaliy commented 2 years ago

Hi, Thanks for your valuable inputs, we had an internal discussion, In general adding new CRD not so nice approach because it pretty much makes ServiceBinding redundant most of the consumers will want cred rotation (at least in SAP),

Also it's not so nice behavior if I create one CR and get in my namespace multiple ServiceBindings and Secrets that consumer not sure what they are.

Basically the idea is to add additional configuration to ServiceBinding something like:

"credRotation":{ "enabled":true, "every":"3M" "keepOld":"2W" }

During the transition there will be 2 bindings: B1 and B1_OLD after "keepOld" time ended B1_OLD is deleted

Technically what allows this implementation be easy is the cluster recovery flow we have, which in case no bindingID on the status tries to lookup binding in SM and if it's not found, BTP Operator will create new binding in SM. So rename the original binding in SM and removing bindingID from status will trigger the rotation without any code required.

I also think that, in the long run, revocation should be supported. I.e., a consumer should be able to instantly trigger a binding rotation.

It should be easy one, we can use some sort of annotation to trigger the rotation right now which will be removed after successful rotation by the operator.

Will it fit your requirements?

patrickhuy commented 2 years ago

That sounds nice to me. There should be some way to learn about when the rotation has happened (maybe a field in the ServiceBinding status?) so that it can be watched/reacted on.

Would it be possible for the operator to also trigger an application restart/rollout (the way kubectl rollout restart does it - which is basically adding an annotation to the deployment/statefulset spec) or would that be completely out of scope? Effectively this is likely what most applications will need to do and just "hoping" that a regular deployment/restart happens in the "keepOld" period sounds not very sustainable.

loewenstein commented 2 years ago

In general adding new CRD not so nice approach because it pretty much makes ServiceBinding redundant most of the consumers will want cred rotation (at least in SAP),

Also it's not so nice behavior if I create one CR and get in my namespace multiple ServiceBindings and Secrets that consumer not sure what they are.

Just a thought on the redundant CR issue. Maybe we could have a ServiceBindingSet that has, among the rotation configuration, a ServiceBinding template - just like ReplicaSet has a Pod template. I haven't really thought this through though and beside that I agree with @patrickhuy - that not integrating this with workload rollout could be risky - the credRotation proposal above with options for when rotation happens and when cleanup happens doen't seem completely off either.

pavelmaliy commented 2 years ago

@loewenstein actually ReplicaSet and Pod this is the exact example we discussed, the problem here is that user can't defines names of bindings in SM, and then in cloud cockpit or CLIs he'll see bunch of guids it's not acceptable by our PO,

Another problem is that Brokers need user info during binding but the binding created by "ServiceBindingSet" so the user needs to be propagated.

@patrickhuy

There should be some way to learn about when the rotation has happened (maybe a field in the ServiceBinding status?) so that it can be watched/reacted on.

of course this is the plan

regarding trigger an application restart/rollout I think not in the first phase, volumeMounts will be updated automatically though and its the best practice.

patrickhuy commented 2 years ago

Just a thought on the redundant CR issue. Maybe we could have a ServiceBindingSet that has, among the rotation configuration, a ServiceBinding template - just like ReplicaSet has a Pod template.

that would mean that we would get one secret per ServiceBinding, right? Consequently it would mean that we need to change the secret referenced from the Deployment, that would not be great.

regarding trigger an application restart/rollout I think not in the first phase, volumeMounts will be updated automatically though and its the best practice. How is volumeMounts best practice? Configuration via the environment is a 12 factor app capability. (https://12factor.net/config) Even when using volumeMounts most applications are likely not going to monitor the file system for changes and be able to live reload it (do you see that as a requirement?)

I do agree that restarts/rollouts are not needed in the first phase. Nevertheless no matter whether the binding is exposed via volumeMounts or via the environments a restart of the workload is oftentimes the right thing after the change in bindings (for comparison: Cloud Foundry changes bindings during re-staging) - I think this is only feasible if at maximum the task is to trigger a rollout of a Deployment/StatefulSet, changing the used secrets is likely too complicated.

pavelmaliy commented 2 years ago

that would mean that we would get one secret per ServiceBinding, right? Consequently it would mean that we need to change the secret referenced from the Deployment, that would not be great.

No ServiceBindingSet will have his own secret which always updated with the latest ServiceBinding and deployment will use this Secret but it was just a thought I don't believe this option will be chosen

@patrickhuy I agree regarding the monitoring the file system besides I can't control how secrets are consumed and where. For now (if at all) its not responsibility of BTP operator to find all consumers of the Secret (not necessarily in same namespace see reflector) and restart/rollout them.

patrickhuy commented 2 years ago

@patrickhuy I agree regarding the monitoring the file system besides I can't control how secrets are consumed and where. For now (if at all) its not responsibility of BTP operator to find all consumers of the Secret (not necessarily in same namespace see reflector) and restart/rollout them.

I agree! I think if the operator would be able to restart Deployments we should list the Deployments for it to restart within the ServiceBinding (probably via a selector?) and it should be limited to the current namespace (because otherwise it makes security considerations harder)

pavelmaliy commented 2 years ago

Hi, I've merged the cred rotation feature, please note it won't be available until SM reaches live (a week), So it not released yet, the documentation was updated.

pavelmaliy commented 2 years ago

please take the latest version cred rotation available there