Azure / fleet

Multi-cluster core
MIT License
80 stars 25 forks source link

[BUG] Membership doesn't get removed properly when deleting member Cluster #881

Closed FunLow closed 3 months ago

FunLow commented 3 months ago

Describe the bug

We currently investigate into the Fleet manager and try to use it to automatically deploy some mandatory agents and services as well as different configurations. To achieve this we would like to join all clusters via Policy into the fleet manager ( Still investigating into how this may work but anyway). As we have also some short living development clusters we need to delete clusters that are still members of the Fleet manager without removing the actual membership before deletion. We already tested this and the membership itself got removed and propagated to the hub cluster but the namespace as well as the membership is stuck in "Terminating" state inside the cluster.

I already spotted thatz the cause for this is that fleet wants to execute still some work on the cluster that doesn't exists anymore (works.placement.kubernetes-fleet.io Resources is still waiting to be executed i guess ?).

Name:         fleet-member-development-feature-cluster
Labels:       kubernetes-fleet.io/is-fleet-resource=true
              kubernetes.io/metadata.name=fleet-member-development-feature-cluster
Annotations:  <none>
Status:       Terminating
Conditions:
  Type                                         Status  LastTransitionTime               Reason                Message
  ----                                         ------  ------------------               ------                -------
  NamespaceDeletionDiscoveryFailure            False   Sun, 14 Jul 2024 11:35:18 +0200  ResourcesDiscovered   All resources successfully discovered
  NamespaceDeletionGroupVersionParsingFailure  False   Fri, 12 Jul 2024 10:20:59 +0200  ParsedGroupVersions   All legacy kube types successfully parsed
  NamespaceDeletionContentFailure              False   Mon, 15 Jul 2024 05:56:10 +0200  ContentDeleted        All content successfully deleted, may be waiting on finalization
  NamespaceContentRemaining                    True    Fri, 12 Jul 2024 10:20:59 +0200  SomeResourcesRemain   Some resources are remaining: works.placement.kubernetes-fleet.io has 1 resource instances
  NamespaceFinalizersRemaining                 True    Fri, 12 Jul 2024 10:20:59 +0200  SomeFinalizersRemain  Some content in the namespace has finalizers remaining: kubernetes-fleet.io/work-cleanup in 1 resource instances

This is also an issue when we recreate a feature cluster with the same name as this conflicts with the existing namespace and membership and is failing for that reason.

Also im not able to resolve this issue as i cant remove the finalizer manually as the admission webhook is blocking me off.

Environment

Please provide the following:

To Reproduce

  1. Create basic hub cluster
  2. Create AKS without specific configuration
  3. Add AKS Cluster as member to the cluster
  4. Delete the AKS Cluster
  5. (Optional) Create the AKS Cluster identically to the one created before.
  6. (Optional) Try join another AKS

Expected behavior

The Cluster should be removed if its not existing anymore or the AKS should handle the membership remove on the deletion in a way that the fleet manager is recognizing that the cluster has been deleted.

LeSlothMisanthropist commented 3 months ago

I noticed the same thing a few weeks ago, would be nice if we could get an answer on this.

zhiying-lin commented 3 months ago

hi @FunLow and @LeSlothMisanthropist thank you for reporting the issue. The issue has already identified and fixed in #865. Could you please try the latest version? Going to close this and please feel free to reopen if the issue is still there.