cloudfoundry / bosh-vsphere-cpi-release

BOSH vSphere CPI
Apache License 2.0
32 stars 36 forks source link

CPI leaves orphan VM's after delete encounters "Unable to write VMX file" error #277

Closed pivotal-todd-robbins closed 3 years ago

pivotal-todd-robbins commented 4 years ago

Describe the bug

Orphan VM's are intermittently getting left behind by CPI. This leads to duplicate IP which causes production impact

BOSH task CPI logs show:

Error running task 'VirtualMachine.destroy'. Failed with message 'Unable to write VMX file
INFO -- [req_id cpi-258561]: Deleted vm: vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce

The vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce however remains powered on in vCenter.

To Reproduce

We cannot consistently reproduce however it has happened a dozen times in one customer environment.

CPI Error Log

I, [2020-08-25T14:42:44.386120 #13220]  INFO -- [req_id cpi-258561]: Deleting vm: vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce
I, [2020-08-25T14:42:46.362700 #13220]  INFO -- [req_id cpi-258561]: VM 'vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce' is already powered off, skipping power off task.
I, [2020-08-25T14:42:46.418485 #13220]  INFO -- [req_id cpi-258561]: Cleaning current agent env from vm 'vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce'
I, [2020-08-25T14:42:46.442511 #13220]  INFO -- [req_id cpi-258561]: NSX-T networks found for vm 'vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce': []
W, [2020-08-25T14:42:47.528382 #13220]  WARN -- [req_id cpi-258561]: Error running task 'VirtualMachine.destroy'. Failed with message 'Unable to write VMX file: /vmfs/volumes/vsan:52bc9fdbb8f32e1d-02f9ad05d2c6a297/3b28445f-626f-4e56-5dd1-e4434be65420/vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce.vmx.' and fault message 'Unable to write VMX file: /vmfs/volumes/vsan:52bc9fdbb8f32e1d-02f9ad05d2c6a297/3b28445f-626f-4e56-5dd1-e4434be65420/vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce.vmx.,Could not find the file'.
W, [2020-08-25T14:42:51.606987 #13220]  WARN -- [req_id cpi-258561]: Error running task 'VirtualMachine.destroy'. Failed with message 'The object 'vim.VirtualMachine:vm-3' has already been deleted or has not been completely created' and fault message ''.
W, [2020-08-25T14:42:55.648873 #13220]  WARN -- [req_id cpi-258561]: Error running task 'VirtualMachine.destroy'. Failed with message 'The object 'vim.VirtualMachine:vm-3' has already been deleted or has not been completely created' and fault message ''.
I, [2020-08-25T14:43:03.688389 #13220]  INFO -- [req_id cpi-258561]: Deleted vm: vm-823f41d1-71dc-4f5b-bdd0-6d634c7ef6ce

Expected behavior

CPI should return an error if deletion fails.

Screenshots

Screen Shot 2020-09-17 at 1 57 16 PM

Release Version & Related Info (please complete the following information): CPI Version : 53.0.11 BOSH Director version: 270.11.1 Stemcell Name & Version: 621.76 vCenter Version NSX(T/V) Version (If using): NSX-T

Additional context

pivotal-todd-robbins commented 4 years ago

epr_261255_cpi.log epr_261255_debug.log

Attached is the full bosh task CPI / debug logging from the event.

julian-hj commented 3 years ago

@pivotal-todd-robbins is this issue still reproducible? If so, a couple of questions...

vmware-todd-robbins commented 3 years ago

I've not heard of this happening since this first report so I cannot currently reproduce. I've checked thread from when this originally occurred and there was some rumblings that it could be fixed with vCenter 7.0. So if we want to just close out this issue then that's fine.

julian-hj commented 3 years ago

Very good, thanks, and apologies for the delayed response!