cloudfoundry-attic / bosh-vcloud-cpi-release

BOSH vCloud CPI
Apache License 2.0
4 stars 11 forks source link

CPI leaks partially powered vapps: delete_vm should first try to power off vapps #7

Open gberche-orange opened 8 years ago

gberche-orange commented 8 years ago

When a bosh deployment fails because of a failed vm instanciation, bosh then asks to clean up the failed vms by invoking the cpi delete_vm method.

In the following scenario, the vcloud cpi fails to perform the delete vms, and would leave a leaking vapp:

1- a vm is requested for creation, but its IP is conflicting with an existing running vm in another vapp. The vapp and vm creation succeed, but the start fails. The vapp remains in the partially powered state.

                    <Error minorErrorCode="BAD_REQUEST" message="The following IP/MAC addresses have already been used by running virtual machines:
MAC addresses:
IP addresses: 192.168.26.215
Use the Fence vApp option to use same MAC/IP. Fencing allows identical virtual machines in different vApps to be powered on without conflict, by isolating the MAC and IP addresses of the virtual machines." majorErrorCode="400"/>

2- the vapp is asked for deletion by the CPI, but vcloud is refusing the delete request because the vm is in the partially started state.

D, [2015-07-15 15:08:10 #9434] [create_vm(9cbce686-08ac-498a-97de-79d99a4a97ce, urn:vcloud:catalogitem:740dbf1b-f5bb-41ee-8272-71bcc90ffc24, {"cpu"=>1, "disk"=>4096, "ram"=>1024}, ...)] DEBUG -- DirectorJobRunner: REST REQ DELETE https://.../api/vApp/vapp-e1d96da6-b607-4634-be46-9996067f1bfa {:Accept=>"application/*+xml;version=5.1", :content_type=>"*/*", :x_vcloud_authorization=>"...="} {"vcloud-token"=>"...", "Path"=>"%2F"}
D, [2015-07-15 15:08:11 #9434] [create_vm(9cbce686-08ac-498a-97de-79d99a4a97ce, urn:vcloud:catalogitem:740dbf1b-f5bb-41ee-8272-71bcc90ffc24, {"cpu"=>1, "disk"=>4096, "ram"=>1024}, ...)] DEBUG -- DirectorJobRunner: REST RES 400 {:date=>"Wed, 15 Jul 2015 15:08:10 GMT", :vary=>"Accept-Encoding", :content_type=>"application/vnd.vmware.vcloud.error+xml;version=5.1", :content_length=>"464"} 
<?xml version="1.0" encoding="UTF-8"?>
<Error xmlns="http://www.vmware.com/vcloud/v1.5" minorErrorCode="BAD_REQUEST" message="The requested operation could not be executed on vApp &quot;9cbce686-08ac-498a-97de-79d99a4a97ce&quot;. Stop the vApp and try again." majorErrorCode="400" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/v1.5 http://[...]/api/v1.5/schema/master.xsd"></Error>

Currently in 0.7.10, the delete_vm translates to a straight call to the vcloud api DELETE vapp . It should check the entity state to first power off the vms, prior to send the delete request

https://github.com/cloudfoundry/bosh_vcloud_cpi/blob/47b2b0aae14436c511c1935bfc33fd85f620c1a5/lib/cloud/vcloud/steps/delete.rb#L8

See vcloud REST API manual at http://pubs.vmware.com/vcd-55/topic/com.vmware.ICbase/PDF/vcd_55_api_guide.pdf p38

Undeploy, Power Off, and Delete the vApp

After you undeploy a vApp and power it off, you can use an HTTP DELETE request to delete the vApp object.

A deployed vApp has a link that you can use with a POST request to undeploy the vApp and take a power action such as powering it off or suspending it. 
A powered-off vApp has a link that you can use with a DELETE request to remove the vApp.

Nb: duplicating https://github.com/cloudfoundry-attic/bosh_vcloud_cpi/issues/15 as the repo was moved to the cloudfoundry-attic and does not seem to get attention anymore

fbonelle commented 8 years ago

Hi,

We have updated to version 18 and the issue is still there. My ruby knowledge are very low. Don't known how to fix this issue and can't make any pull request.

Are there any plans to fix this issue ?

Regards