Closed cnelson closed 8 years ago
In most cases that should not be called. That restart occurs as a last ditch effort to ensure that the designated IP is correctly set on the VM. If at this point the ip in VMWare still does not reflect the static IP assigned to the machine, it will reboot in an to ensure the IP is correctly set.
When I see this happen, it usually means there is some other network or domain related issue happening with the box but I cant recall the specifics.
I assume you are trying to give your VMs a static IP? Do you see that IP registered with the VM in your vsphere client with that reboot removed?
I actually see this issue primarily with machines that are DHCP. The machine gets an address from DHCP, vmware-tools in the guest communicates it back to VSphere, this driver sees the address, then reboots the machine anyway. Relevant snippet from a run:
- IP addresses found: []
- IP addresses found: []
- IP addresses found: ["10.102.101.198", "fe80::250:56ff:fe91:215"]
- rebooting...
Looking at the code it looks like wait_for_ip is succeeding but then the subsequent call to has_ip fails resulting in a reboot... Any suggestions how to debug this further?
Ahh ok. This makes sense but I don't think removing the reboot is the right fix for this. The current ip waiting logic is rather flawed for DHCP. It may be best to have attempt_ip just return if using DHCP. However, that might cause the wrong ip t obe recorded with the chef machine_spec
. The original ip of the vm template could end up in the machine spec which would be bad.
Fixing this would make this faster not only from eliminating the restart but it would eliminate the long wait for the ip that it will never get - minutes.
I just left CenturyLink and no longer have access to vsphere infrastructure to test or futz with that. That may change soon and I will be happy to help if I can.
I'm actively working on a vSphere based project right now so if I can find the time to diagnose this a bit more, I'll see about a patch to make it a bit more robust.
I'd really be interested to see if #41 fixes this. I don't have access to an environment where I can test it but if anyone wants to give it a try, 0.8.3.dev has that PR. Please report back results if you do try it.
Each new machine I bring up is rebooted before it's converged. This increases my runtime significantly.
I've commented out this section of code as a test, and provisioning runs fine without a reboot:
https://github.com/CenturyLinkCloud/chef-provisioning-vsphere/blob/19272e9225984a8e3abc8a0a51c978044468430a/lib/chef/provisioning/vsphere_driver/driver.rb#L320
Before I submit a PR to change this behavior, can someone educate me on why this driver reboot each machine after it comes up for the first time? Is it working around some issues in vSphere that I'm haven't run into?