CloudBindle / youxia

A VM provisioner and fleet management tool.
GNU General Public License v3.0
2 stars 1 forks source link

Clean up OpenStack VMs that fail to start properly #54

Open SolomonShorser-OICR opened 8 years ago

SolomonShorser-OICR commented 8 years ago

Sometimes OpenStack VMs start and immediately are in an "ERROR" state. I've seen this happen when the resources for the VM flavour are available to the OpenStack tenant so the VM is created but the resources are not all available in the same physical compute node, so OpenStack immediately fails and the new VM is left in an ERROR state. For example, the tenant might have lots of memory available, but spread out in little portions between many nodes. If the flavour requires more memory than is available in any single node, but less than the sum of all free memory across all nodes, then this could happen.

The problem is that the new VM still uses some of the resources it wanted, so they are still allocated to that VM. Youxia is not able to properly remove these VMs, so they accumulate while not doing anything, but they still consume some resources. In theory, this could lead to a situation where Youxia keeps launching VMs that can't run but consume resources until all resources are locked up by idle/ERROR VMs.

Youxia needs a way to detect VMs that are in this state and remove them before attempting to create more VMs.

denis-yuen commented 8 years ago

Looks like