cloudfoundry / bosh-vsphere-cpi-release

BOSH vSphere CPI
Apache License 2.0
32 stars 36 forks source link

Bug fix: Do not prematurely timeout when placing VMs in NSX server pools #375

Closed cunnie closed 9 months ago

cunnie commented 9 months ago

When creating a VM which is placed in an NSX server pool using the NSX Policy API (e.g. a TAS Router VM), heavily-loaded vSphere environments may exceeed the timeout for discovering the VM's IP address, returning a "Did not find primary IP" error and aborting the deploy.

This commit increases the timeout 100 → 300 seconds. The longest timeout we saw in the wild was 118 seconds, so we doubled that and added padding.

Note: We don't need to worry about an over-arching BOSH Director timeout: during the create_vm, the BOSH Director has infinite patience, and relies on the CPI to manage timeouts according to Joseph Palermo.

Fixes, during bosh deploy:

Task 11148 | 07:49:11 | Creating missing vms: router/xxx (9) (00:02:41)
                 L Error: Unknown CPI error 'Unknown' with message 'Did not find primary IP for VM (VSphereCloud::Resources::VM (cid="vm-xxx"))' in 'create_vm' CPI method (CPI request ID: 'cpi-897383')

Special thanks to Suman Chakraborty for reporting the bug and diagnosing the cause.