Azure / open-service-broker-azure

The Open Service Broker API Server for Azure Services
https://osba.sh
MIT License
248 stars 101 forks source link

MSSQL Service Bindings/ Instances are not being deprovisioned #489

Open kdcllc opened 6 years ago

kdcllc commented 6 years ago

I have been trying to figure out why running: helm delete --purge causes the bindings and instances not to be deprovision in Azure Cloud.

After much time spend I was able to get this error message:

svcat get instances -n dev1
                    NAME                      NAMESPACE       CLASS        PLAN                    STATUS
+-------------------------------------------+-----------+----------------+-------+-----------------------------------------+
  nservicebus-sql-instance         dev1        azure-sql-12-0   basic   DeprovisionBlockedByExistingCredentials

It seems that that is a bug.

What can be used to clean up ectd database and Azure resources?

I also use the database connection being generated thru provisioning via deployment.yaml:

env:      
        - name: ASPNETCORE_ENVIRONMENT
          value: "Release"
        - name: SQL_HOST
          valueFrom:
            secretKeyRef:
              name: rabbitmq-sql-secret
              key: host 
        - name: SQL_DATABASE
          valueFrom:
            secretKeyRef:
              name: rabbitmq-sql-secret
              key: database 
        - name: SQL_USERNAME
          valueFrom:
            secretKeyRef:
              name: rabbitmq-sql-secret
              key: username
        - name: SQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: rabbitmq-sql-secret
              key: password 
        - name: NServiceBus__ConnectionStrings__Persistence
sam-cogan commented 6 years ago

I'm also seeing the exact same issue with MSSQL resources using Helm delete --purge, DeprovisionBlockedByExistingCredentials.

Is this something to do with the way helm is deleting things, or an issue with OSBA?

jeremyrickard commented 6 years ago

Hey @kdcllc and @sam-cogan,

The most likely answer is this is related to how Helm deletes things. In general, there isn't any guarantee on order of deletion with Helm, however Service Catalog (via the Broker). There is, however, an order required for the resources service catalog creates.

In the case above:

svcat get instances -n dev1
                    NAME                      NAMESPACE       CLASS        PLAN                    STATUS
+-------------------------------------------+-----------+----------------+-------+-----------------------------------------+
  nservicebus-sql-instance         dev1        azure-sql-12-0   basic   DeprovisionBlockedByExistingCredentials

This status message indicates that that deprovision of the instance is blocked by the existence of a binding. When you do the Helm delete, it is probably issuing the deletes for those Service Catalog resources in parallel or in very short sequence, without regard to the ordering. Things should eventually progress, although those operations are all asynchronous and you can sometimes get into longer backoff waiting periods. For example, when the ServiceInstance is deleted and there is still a binding, Kubernetes/Service Catalog will enter a backoff loop trying to delete it.

It is also possible there is some issue, however, so if you are still seeing this I'd love to look into it more. If you're able to reproduce this, it would be helpful if you can share the Helm charts that you're using (or at least a version of it that creates those resources).

sam-cogan commented 6 years ago

Yeah I am still seeing it, pretty much every time I delete a deployment. I have left the deployment in this state for multiple days and it still does not get cleaned up, and it doesn't seem that I can manually delete them either. Here's an example of the Helm chart used for this:

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
  name: "{{ .Release.Name }}-{{ .Values.App.Name }}-sqlserver"
  namespace: {{ .Values.NameSpace }}
spec:
  clusterServiceClassExternalName: azure-sql-12-0-dbms
  clusterServicePlanExternalName: dbms
  parameters:
    location: {{ .Values.Location }}
    resourceGroup: "{{ .Release.Name }}-RG"
    alias: "{{ .Release.Name }}-{{ .Values.App.Name }}-sqlserver"
    firewallRules:
    - startIPAddress: "0.0.0.0"
      endIPAddress: "255.255.255.255"
      name: "AllowAll"
---

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
  name: "{{ .Release.Name }}-{{ .Values.App.Name }}-appdb"
  namespace: {{ .Values.NameSpace }}
spec:
  clusterServiceClassExternalName: azure-sql-12-0-database
  clusterServicePlanExternalName: {{ .Values.App.Database_SKU }}
  parameters:
    parentAlias: "{{ .Release.Name }}-{{ .Values.App.Name }}-sqlserver"

---

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
  name: "{{ .Release.Name }}-{{ .Values.App.Name }}-logdb"
  namespace: {{ .Values.NameSpace }}
spec:
  clusterServiceClassExternalName: azure-sql-12-0-database
  clusterServicePlanExternalName: {{ .Values.App.Database_SKU }}
  parameters:
    parentAlias: "{{ .Release.Name }}-{{ .Values.App.Name }}-sqlserver"

---

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
  name: "{{ .Release.Name }}-{{ .Values.App.Name }}-appdb-binding"
  namespace: {{ .Values.NameSpace }}
spec:
  instanceRef:
    name: "{{ .Release.Name }}-{{ .Values.App.Name }}-appdb"
  secretName: "{{ .Release.Name }}-{{ .Values.App.Name }}-appdb-secret"

---

apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
  name: "{{ .Release.Name }}-{{ .Values.App.Name }}-logdb-binding"
  namespace: {{ .Values.NameSpace }}
spec:
  instanceRef:
    name: "{{ .Release.Name }}-{{ .Values.App.Name }}-logdb"
  secretName: "{{ .Release.Name }}-{{ .Values.App.Name }}-logdb-secret"
jeremyrickard commented 6 years ago

@sam-cogan thanks for the additional info. Could you share the state of the binding?

svcat get binding <whatever it is called>

My guess is that the binding is failing to delete for some reason. I'll try out your chart and look into it more!

sam-cogan commented 6 years ago

This is what I see:

            NAME               NAMESPACE        INSTANCE              STATUS
+-----------------------------+-----------+---------------------+------------------+
  client13-app-apdb-binding     default     client13-app-appdb     UnbindCallFailed

Further details from describe:

  Name:        client13-smf-rldb-binding
  Namespace:   default
  Status:      UnbindCallFailed - Error unbinding from ServiceInstance "default/client13-app-appdb" of ClusterServiceClass (K8S: "2bbc160c-e279-4757-a6b6-4c0a4822d0aa" ExternalName: "azure-sql-12-0-database") at ClusterServiceBroker "osba": Status: 500; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil> @ 2018-06-26 15:37:43 +0000 UTC
  Instance:    client13-app-appdb
kdcllc commented 6 years ago

@jeremyrickard I am getting the same exceptions.

jeremyrickard commented 6 years ago

Thanks @sam-cogan and @kdcllc. Will take a dive into this and root cause it.

jeremyrickard commented 6 years ago

@sam-cogan I'm attempting to reproduce this today using the chart example you gave above and not having much luck. I don't believe we changed anything in version 1.0.1, but if you could try the latest OSBA and also share the OSBA logs (ideally start osba with log level set to debug) when you see the failure, it would be appreciated:


helm install azure/open-service-broker-azure --name osba --namespace osba \
  --set azure.subscriptionId=$AZURE_SUBSCRIPTION_ID \
  --set azure.tenantId=$AZURE_TENANT_ID \
  --set azure.clientId=$AZURE_CLIENT_ID \
  --set azure.clientSecret=$AZURE_CLIENT_SECRET
cwoolum commented 6 years ago

I was able to catch some logs

time="2018-07-06T04:16:41Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:42Z" level=debug msg="binding complete" bindingID=9d3514fe-80d1-11e8-b821-0e2b5de3ad56 instanceID=9e5d2eb8-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:42Z" level=debug msg="received binding request" bindingID=319b9be1-80d2-11e8-b821-0e2b5de3ad56 instanceID=31b5ed8c-80d2-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:42Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:42Z" level=debug msg="binding complete" bindingID=de64a1fa-80d1-11e8-b821-0e2b5de3ad56 instanceID=de80709f-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:42Z" level=debug msg="received binding request" bindingID=9d3514fe-80d1-11e8-b821-0e2b5de3ad56 instanceID=9e5d2eb8-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:43Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:43Z" level=debug msg="binding complete" bindingID=319b9be1-80d2-11e8-b821-0e2b5de3ad56 instanceID=31b5ed8c-80d2-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:43Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:44Z" level=debug msg="received binding request" bindingID=9d3514fe-80d1-11e8-b821-0e2b5de3ad56 instanceID=9e5d2eb8-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:44Z" level=debug msg="received binding request" bindingID=de64a1fa-80d1-11e8-b821-0e2b5de3ad56 instanceID=de80709f-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:45Z" level=debug msg="received binding request" bindingID=319b9be1-80d2-11e8-b821-0e2b5de3ad56 instanceID=31b5ed8c-80d2-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:45Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:45Z" level=debug msg="received binding request" bindingID=9d3514fe-80d1-11e8-b821-0e2b5de3ad56 instanceID=9e5d2eb8-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:45Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:46Z" level=debug msg="received binding request" bindingID=de64a1fa-80d1-11e8-b821-0e2b5de3ad56 instanceID=de80709f-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:46Z" level=debug msg="received binding request" bindingID=319b9be1-80d2-11e8-b821-0e2b5de3ad56 instanceID=31b5ed8c-80d2-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:47Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:47Z" level=debug msg="received binding request" bindingID=de64a1fa-80d1-11e8-b821-0e2b5de3ad56 instanceID=de80709f-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:47Z" level=debug msg="received binding request" bindingID=319b9be1-80d2-11e8-b821-0e2b5de3ad56 instanceID=31b5ed8c-80d2-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:47Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:48Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:48Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:49Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:49Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:49Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:50Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:50Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:51Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:51Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:52Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:52Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:53Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:16:59Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:16:59Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:17:10Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:17:10Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED
time="2018-07-06T04:17:31Z" level=debug msg="received unbinding request" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56
time="2018-07-06T04:17:32Z" level=error msg="unbinding error: error executing service-specific unbinding logic" bindingID=5b868769-80d1-11e8-b821-0e2b5de3ad56 error="error dropping user \"iinqk5yvv9\": mssql: The database principal owns a schema in the database, and cannot be dropped." instanceID=5b906a27-80d1-11e8-b821-0e2b5de3ad56 status=UNBINDING_FAILED

I create a schema when my app first runs. It appears that because the user owns that schema, the DB cannot be dropped. I'm guessing that it's trying to delete the user before deleting the database.

I deleted my helm deployment to trigger this recurrence and lost my credentials. I reset the master password and was able to delete the schema but now OSBA has the wrong password and can't delete the instance anyway. Deleting the Server manually causes OSBA to throw exceptions. I think if OSBA is in delete mode and the instance is not found, it should assume that it has been successfully deleted and continue on.

zhongyi-zhang commented 6 years ago

I googled The database principal owns a schema in the database, and cannot be dropped. Looks it is for security reason, a schema owner can't be deleted unless it transfers the ownership to any other database principle (usually dbo). We can make OSBA list and transfer the schema ownerships in unbinding. Should it?

cwoolum commented 6 years ago

I think that's a lot more work than just changing the order of the deprovisioning process. If the databases were dropped before the users were dropped from the server, wouldn't that achieve the same effect?

krancour commented 6 years ago

I think that's a lot more work than just changing the order of the deprovisioning process. If the databases were dropped before the users were dropped from the server, wouldn't that achieve the same effect?

No can do. The lifecycle of a service is this:

  1. Created (provision)
  2. App needs to use it (bind)
  3. App no longer needs to use it (unbind)
  4. Service no longer needed (deprovision)

Just because one app (possibly out of many) no longer needs to access a given service instance, doesn't mean it's safe to make that service instance go away (e.g. drop the db in this case). Others may still be using it.

In terms of enforcing this lifecycle-- that's not even our decision. The platforms that utilize brokers such as ours (for instance Kubernetes Service Catalog or Cloud Foundry) follow and enforce that model.

I googled The database principal owns a schema in the database, and cannot be dropped. Looks it is for security reason, a schema owner can't be deleted unless it transfers the ownership to any other database principle (usually dbo).

I don't have much experience with SQL Server, but this isn't very surprising to me. I ran into what I assume was the same thing when implementing the PostgreSQL module...

In PostgreSQL, every object in a schema (tables, for instance) is owned by some "role" (which is really a principal that could be either a user or a group). In my first, naive attempt to create that module, I didn't really do anything special with ownership. Here's what happened: If a bound user created objects in the schema (run a database migration, for instance), all those objects belonged to that role (user). At that point, we couldn't unbind (drop) that role (user) because it's not allowed to orphan all those objects they created.

I suspect the same is going on here.

We can make OSBA list and transfer the schema ownerships in unbinding. Should it?

That seems complicated. Even if it worked, there would be another issue lurking around the corner (without an unbind ever coming into play):

  1. DB is provisioned
  2. App 1 binds
  3. App 1 runs some migrations; now owns all objects in schema
  4. App 2 binds
  5. App 2 can't read/write any of the objects even though they bound successfully

Here's how I solved for both of these problems in PostgreSQL. When provisioning a database, I created a role (group) with the same name as the database and assigned ownership of that new database to that group. Upon bind, the newly created user is assigned to the role (group) that owns the database and (very important) is altered to that the default role (group) assumed by the user when they connect to the database is the role (group) that owns the database. The result is that any objects that get created by that user will belong to the group; they will not belong directly to the user. At that point, unbinding a user is a simple matter of dropping them from the group and then dropping that user entirely.

Again-- I don't count myself as a SQL Server expert, but it seems something similar to the above is what should be happening.

@zhongyi-zhang I think you know SQL Server considerably better than I do. Do you have thoughts on if it makes sense to make SQL Server behave more like what I am doing for PostgreSQL. If so, do you know how to achieve it?

krancour commented 6 years ago

I'm attempting to reproduce this today using the chart example you gave above and not having much luck.

afaict, reproducing this will require creating things in the schema after binding. (Something our tests don't do.) Have you?

zhongyi-zhang commented 6 years ago

@krancour AFAIK, how you solved it in PostgreSQL can't be achieved in MSSQL. Azure MSSQL does have a group concept for user management system. But it is for Azure/Windows Active Directory accounts.\

To be honest, I think users have the responsibility to execute something like ALTER AUTHORIZATION ON schema::<schema_name> TO db_owner if they created a schema and want to leave the schema in the database then unbind.

krancour commented 6 years ago

I think users have the responsibility to execute something like ALTER AUTHORIZATION ON schema:: TO db_owner if they created a schema and want to leave the schema in the database then unbind.

If that means that an unbind cannot be accomplished before a bound user takes some manual action, that, frankly, creates a lot of undesirable friction in the provision/bind/unbind/deprovision lifecycle.

I believe @jeremyrickard has uncovered some options for how this can be done, but I'm not sure where he stands at the moment with proving out the concept.

References:

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/ownership-and-user-schema-separation-in-sql-server

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/authorization-and-permissions-in-sql-server

pabloromeo commented 5 years ago

I've recently come across this issue as well. In my case it was caused by IdentityServer creating its database (through EF core) in a Schema other than "dbo". This basically created a database that could not be deprovisioned. The broker can't delete the user because it owns the schema.

In my foolish frustration I thought that if I manually deleted the database and server from Azure directly (it was a temporary database anyways) the broker would be smart enough to notice that the database is no longer there and delete the Binding and the Instance in the catalog and recover from the situation. I was wrong. Nor the Binding nor the Instance can be deleted now leaving these errors in the log and retrying continuously: msg="unbinding error: error executing service-specific unbinding logic" bindingID=2ce9adad-4510-11e9-87cf-d2d452015806 error="error connecting to the database: lookup ......database.windows.net on 10.0.0.10:53: no such host" instanceID=..... status=UNBINDING_FAILED

zhongyi-zhang commented 5 years ago

@pabloromeo the broker always assumes that the Azure resources it created are under its management. For your case, I think it makes more sense to leverage https://github.com/kubernetes-incubator/service-catalog/issues/2268.

zhongyi-zhang commented 5 years ago

As it is not implemented in svcat, the workaround for now is to manually delete the binding records from the broker store (the redis). Then the instance deletion can work as your expectation -- it can succeed if the resource is no longer existed.

pabloromeo commented 5 years ago

That sounded promising, so i manually deleted the redis entry for key "binding:{externalId-guid}", and the binding did indeed disappear from the list of svcat get bindings, but deprovisioning of the instance still fails.

catalog-controller-manager logs show 409 errors returned by the broker:

I0314 13:04:15.828036       1 controller.go:396] Dropping ServiceInstance "pr-412/some-db" out of the queue: Deprovision call failed; received error response from broker: Status: 409; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
I0314 13:04:15.828161       1 event.go:221] Event(v1.ObjectReference{Kind:"ServiceInstance", Namespace:"pr-412", Name:"some-db", UID:"2efb0ec7-4510-11e9-87cf-d2d452015806", APIVersion:"servicecatalog.k8s.io/v1beta1", ResourceVersion:"1484", FieldPath:""}): type: 'Warning' reason: 'DeprovisionCallFailed' Deprovision call failed; received error response from broker: Status: 409; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
zhongyi-zhang commented 5 years ago

@pabloromeo TL;DR please also delete the instance entry and retry.

The 409 was expected. OSBA only accepts deprovision request for instances in states provisioned, provisioning_failed, and updated_failed. Other states are considered as in progress state, including the deprovisioning_failed. Because the async engine in OSBA would continue retrying the failed deprovisioning job by fetching from the async task store. For your case, upgrading OSBA to get the fix for Azure Go SDK might help. But I am not sure if the async task store of your OSBA deployment is broken too... So I just recommend to brutally delete the redis entry -- that should always work.

kyschouv commented 5 years ago

I'm having the same issue deleting appinsights instances/bindings. I now have a bunch stuck in a namespace I can't get fully cleaned up.

zhongyi-zhang commented 5 years ago

Looks svcat supports abandoning bindings/instances now: https://github.com/kubernetes-sigs/service-catalog/blob/master/docs/tasks/stuck_instance.md. Could you have a try?