gardener / gardener-extension-provider-vsphere

Gardener extension controller for the vSphere cloud provider (https://www.vmware.com).
https://gardener.cloud
Other
8 stars 46 forks source link

NSXT Failing to delete network segment #210

Open tuxgoose opened 2 years ago

tuxgoose commented 2 years ago

What happened: Shoot cluster was unable to be deleted due to nsxt reported vm is still connected

task "Waiting until shoot infrastructure has been destroyed" failed: Failed to delete Infrastructure shoot--fglstaging-- fgdvcsusdm1/fgdvcsusdm1: Error deleting infrastructure: deleting segment failed: InvalidRequest: Segment path= [/infra/segments/segment-a16d536d-8caf-4cc6-a8fa-6afc2b612309] has 1 VMs or VIFs attached. Disconnect all VMs and VIFs before deleting a segment. (code 503040) What you expected to happen: Expect that the shoot is created, deployed and deleted

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know: Believe we have seen this occur where nsxt believes there is still a vm within a node segment yet there is none reported in vSphere, this could be an orphaned network interface?

tuxgoose commented 2 years ago

VM 576.1.0 was moved to another network segment and the shoot successfully reconciled.

MartinWeindel commented 2 years ago

I have seen this issue sporadically. For me it looks like some sync problem between vSphere and NSX-T. It was not reproducible consistently. Probably need more investigation together with VMware. Alternatively, it may be resolved with the help of a remedy controller. Note that resolving needs access to the old "advanced" NSX-T API to delete the VIFs.

It may also happen if some rule are added on vSphere side by some administration logic.