SAP / sap-btp-service-operator

SAP BTP service operator enables developers to connect Kubernetes clusters to SAP BTP accounts and to consume SAP BTP services within the clusters by using Kubernetes native tools.
Apache License 2.0
125 stars 52 forks source link

SAP BTP Operator credential rotation leaves many "backup" service-bindings in state CreateInProgress #467

Closed BarzanaPanajotova closed 1 month ago

BarzanaPanajotova commented 1 month ago

We have configured service binding rotation as described here: https://github.com/SAP/sap-btp-service-operator?tab=readme-ov-file#service-binding-rotation .

As far as I understand the service-binding stays with the same name and when the rotation occurs, the old binding is backed up with a name like -hjs3d . This binding should be successfully created and deleted when the rotatedBindingTTL is passed. In our case however the new "backup" binding is not cleared in the specified timeframe (even after invoking services.cloud.sap.com/forceRotate: "true".

For example:

apiVersion: services.cloud.sap.com/v1
kind: ServiceBinding
metadata:
  name: test-binding
  namespace: test-namespace
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  serviceInstanceName: test-binding
  secretName: test-secret
  parameters:
    credential-type: X509_GENERATED
    key-length: 4096
    validity: 14
    validity-type: DAYS
    app-identifier: test-id
  credentialsRotationPolicy:
    enabled: true
    rotatedBindingTTL: 5m
    rotationFrequency: 1h

and the status on the clusters is:

NAMESPACE NAME INSTANCE STATUS READY MESSAGE │ test-namespace test-binding test-binding Created True ServiceBinding provisioned s │ │ test-namespace test-binding-3jm0sw test-binding CreateInProgress False binding with same name exist │ │ test-namespace test-binding-eyqvea test-binding CreateInProgress False binding with same name exist │

Why is the backup is stuck in CreateInProgress and the "errored" bindings are not cleared after the ttl is passed? How should we proceed with this issue? It has happened on all our clusters.

We use ArgoCD for deployment of the helm charts if it matters. Image: sap-btp-service-operator/controller:v0.6.8

kerenlahav commented 1 month ago

Hi @BarzanaPanajotova The ttl is effective only for ready bindings, if the binding is in fail state it will keep trying to create it. Seems like you have an issue with the cluster since it cannot handle the bindings successfully. Usally this kind of error appears when the cluster id changed and the operator cannot recover the rotated bindings, check that the binding's cluster id matches the cluster id that is configured in sap-btp-operator config map.

BarzanaPanajotova commented 1 month ago

Hi @kerenlahav , The binding doesn't have a cluster id for some reason. We do not configure this at all for our service bindings.

For reference: We have an IAS instance in btp and we create many reference instances to this instance. We create service bindings to the reference instances with the btp operator and as far as I see those SB don't have cluster id set on them. (checked this through the service manager REST API) We have an SMS instance in btp and we create many reference instances to this instance. We create service bindings to the reference instances with the btp operator and those SB have cluster id set on them. (checked this through the service manager REST API) Why would the SMS SB have this cluster id set and the IAS not when we create them in the same way? For SMS we don't have the rotation configured but I don't think this matters.

I065450 commented 1 month ago

when you create service binding with the operator, it should include cluster ID. If you create a new binding for the IAS instance now, does it include the cluster ID?

BarzanaPanajotova commented 1 month ago

All our clusters (dev, staging, prod) are in the same state. SMS reference instance service bindings have clusterid set and IAS reference instance service bindings don't have clusterid set. We bring up new clusters for DEV environment and delete them at least once or twice a week. All the new clusters create new reference instances and new service bindings to those reference instances. On all of them, SMS SB has clusterid and IAS doesn't. Could we set up a quick meeting with someone so that I can show the setup? It is internal for SAP so I can not truly post it here.

I065450 commented 1 month ago

Please open JIRA ticket on Service Manager Component with all details on the env and the instances ids.