GoogleCloudPlatform / k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
https://cloud.google.com/config-connector/docs/overview
Apache License 2.0
896 stars 230 forks source link

IAMServiceAccount + CloudIdentityMembership race condition #760

Open ppawiggers opened 1 year ago

ppawiggers commented 1 year ago

Checklist

Bug Description

I use config connector to create a GCP Service Account and add it as a member to an existing Google Group, see YAML manifests below.

This works fine, the GSA is created and added to the existing Google group. However, when I delete a namespace with both config connector resources, the CloudIdentityMembership sometimes gives this error:

Delete call failed: error fetching live state: error reading underlying
      resource: googleapi: Error 403: Error(2028): Permission denied for resource
      groups/mygroup/memberships/123 (or it may not exist).

I think this is a race condition - if the IAMServiceAccount gets deleted, it's also removed as member of the group. If after that it tries to delete the membership, it fails because it's not a member of the group anymore.

As far as I can tell, I can't refer to the IAMServiceAccount object in the CloudIdentityMembership (which I usually can). That would probably solve it though.

Additional Diagnostic Information

None

Kubernetes Cluster Version

v1.23.13-gke.900

Config Connector Version

1.82.0

Config Connector Mode

cluster mode

Log Output

No response

Steps to reproduce the issue

  1. Create the IAMServiceAccount manifest
  2. Create a CloudIdentityMembership manifest to add the SA to a group
  3. Remove the IAMServiceAccount, wait until GSA is deleted
  4. Remove the CloudIdentityMembership, this will fail.

YAML snippets

apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMServiceAccount
metadata:
  name: sa
---
apiVersion: cloudidentity.cnrm.cloud.google.com/v1beta1
kind: CloudIdentityMembership
metadata:
  name: membership
spec:
  groupRef:
    external: groups/mygroup
  preferredMemberKey:
    id: sa@myproject.iam.gserviceaccount.com
  roles:
  - name: MEMBER
maqiuyujoyce commented 1 year ago

Hi @ppawiggers , thank you for reporting the issue.

Is the underlying Cloud Identity Membership resource (the GCP resource) gone after step#3 (in Steps to reproduce the issue), or is it still existent? If the underlying GCP resource is already gone, then to delete the corresponding CloudIdentityMembership resource in Config Connector, you can set deletion policy to abandon, and abandon the resource. It'll be steps#1-#3 in this troubleshooting section.

Right now, Config Connector cannot handle deletion ordering properly. One alternative is to use the depends-on annotation in kpt/ConfigSync/Anthos Config Management to work around the problem. Here is an tutorial for the annotation.

jb-metzger commented 1 year ago

Hi @maqiuyujoyce, following up on your comment and responding on behalf of @ppawiggers.

The underlying Cloud Identity Membership isn't gone, So abandoning the resource isn't an option. ConfigSync is also not an option as we use fluxcd.

For now as a workaround our pipeline first deletes Cloud Identity Membership, waits for reconciliation and then deletes the IAM Service Account.

jb-metzger commented 1 year ago

@maqiuyujoyce Do you know if a long term fix is already planned where one could natively set "depends-on"-like values? I would assume that not only CloudIdentityMembership and IAMServiceaccount have this kind of dependency.