IBM / operand-deployment-lifecycle-manager

Managing the lifecycle for a group of operands
Apache License 2.0
32 stars 46 forks source link

OperandRequests with Finalizers block namespace deletion #649

Closed carrolp closed 3 years ago

carrolp commented 3 years ago

/kind bug

What steps did you take and what happened:

  1. Create a namespace (named ptest in my test)
  2. Create an OperandRequest in the namespace (for an licensing service in my test)
  3. Wait for the OperandRequest to result in the creation of a configmap and secret in the namespace (provides service url and token for the service requested)
  4. Delete the namespace
  5. The namespace is stuck in 'terminating' state because the OperandRequest cannot be deleted. The OperandRequest cannot be deleted because it still has the finalizer.request.ibm.com finalizer.

What did you expect to happen: The Finalizer should be automatically removed from the OperandRequest and the namespace able to be deleted cleanly.

Anything else you would like to add: The log from the operand-deployment-lifecycle-manager-[id] pod has many messages like this upon deleting the namespace:

I0408 22:45:21.631673       1 operandbindinfo_controller.go:88] Reconciling OperandBindInfo: ibm-common-services/ibm-licensing-bindinfo
E0408 22:45:22.048797       1 operandbindinfo_controller.go:200] failed to reconcile the OperandBindinfo ibm-common-services/ibm-licensing-bindinfo: the following errors occurred:
  - failed to create secret ptest/ibm-licensing-bindinfo-ibm-licensing-token: secrets is forbidden: User "system:serviceaccount:ibm-common-services:operand-deployment-lifecycle-manager" cannot create resource "secrets" in API group "" in the namespace "ptest"
  - failed to create secret ptest/dummyenv-metering-lssecret: secrets is forbidden: User "system:serviceaccount:ibm-common-services:operand-deployment-lifecycle-manager" cannot create resource "secrets" in API group "" in the namespace "ptest"
I0408 22:45:22.682684       1 request.go:621] Throttling request took 1.043243216s, request: GET:https://172.30.0.1:443/apis/config.openshift.io/v1?timeout=32s
I0408 22:45:23.060068       1 operandbindinfo_controller.go:88] Reconciling OperandBindInfo: ibm-common-services/ibm-licensing-bindinfo
E0408 22:45:23.385037       1 operandbindinfo_controller.go:200] failed to reconcile the OperandBindinfo ibm-common-services/ibm-licensing-bindinfo: the following errors occurred:
  - failed to create secret ptest/ibm-licensing-bindinfo-ibm-licensing-token: secrets is forbidden: User "system:serviceaccount:ibm-common-services:operand-deployment-lifecycle-manager" cannot create resource "secrets" in API group "" in the namespace "ptest"
  - failed to create secret ptest/dummyenv-metering-lssecret: secrets is forbidden: User "system:serviceaccount:ibm-common-services:operand-deployment-lifecycle-manager" cannot create resource "secrets" in API group "" in the namespace "ptest"
W0408 22:45:24.140279       1 operandrequest_controller.go:98] No permission to update OperandRequest

To my eyes it is trying to reconcile the OperandRequest because the Secret was deleted during (attempted) namespace deletion. But it cannot recreate the secret because the namespace is being deleted. I assume it needs some logic to check the namespace when reconciling. If the namespace.metadata.deletionTimestamp is not nil, the namespace is being deleted. If the namespace is being deleted then instead of trying to create/update the operands, the reconcile loop should just remove the finalizer from the OperandRequest.

Environment:

horis233 commented 3 years ago

@carrolp

It is because ODLM doesn't have cluster permission in the cluster. It leverages the namespacescope operator to project its namespace scope permission into the target namespace. So when you deleting the target namespace, the projected roles and rolebindings are deleted, which causes ODLM to lose the permission in the terminating namespace and it can't remove the finalized from the OperandRequest

carrolp commented 3 years ago

@horis233 I think that makes sense why it's having trouble. Thinking aloud... the "projected roles and rolebindings" perhaps should also have finalizers on them so that they aren't deleted till ODLM has finished finalizing the OperandRequest?