Stemcell: ubuntu-xenial/621.261
Bosh: 2.10.46-build.541
TAS 2.13.8
Issue description:
While we are trying to scale diego cell count in our TAS env and having network/org churn running in parallel, we are facing below issue due to which deployment is failing. This issue can also be observed during NSXT tile upgrade from ops manager and having network/org churn running in parallel.
Task 96 | 09:05:47 | Creating missing vms: diego_cell/a8a9036f-bc39-4412-b73a-0b920b3050de (14) (00:53:37)
Task 96 | 09:05:56 | Creating missing vms: diego_cell/cf1f94cd-790b-407b-bedb-eea685604974 (15) (00:53:46)
L Error: Unknown CPI error 'Unknown' with message 'The object 'vim.dvs.DistributedVirtualPortgroup:dvportgroup-22218' has already been deleted or has not been completely created' in 'set_vm_metadata' CPI method (CPI request ID: 'cpi-432182')
Task 96 | 09:05:56 | Creating missing vms: diego_cell/57211cee-682c-4023-b84e-77331e12ac5c (17) (00:53:46)
L Error: Unknown CPI error 'Unknown' with message 'The object 'vim.dvs.DistributedVirtualPortgroup:dvportgroup-22218' has already been deleted or has not been completely created' in 'set_vm_metadata' CPI method (CPI request ID: 'cpi-668394')
Task 96 | 11:25:18 | Creating missing vms: diego_cell/06d316fe-1269-4b41-bd2c-d48f180ed3fe (18) (03:13:08)
Task 96 | 11:28:38 | Creating missing vms: diego_cell/205ec15c-0873-4079-8d06-68df49bd8c00 (13) (03:16:28)
Task 96 | 11:30:39 | Creating missing vms: diego_cell/3e9a5f94-21be-470a-a8b8-de54b35486f8 (10) (03:18:29)
Task 96 | 11:31:57 | Creating missing vms: diego_cell/37d64fad-c101-4fc3-a2d1-bb378e7e85d4 (19) (03:19:47)
Task 96 | 11:31:57 | Error: Unknown CPI error 'Unknown' with message 'The object 'vim.dvs.DistributedVirtualPortgroup:dvportgroup-22218' has already been deleted or has not been completely created' in 'set_vm_metadata' CPI method (CPI request ID: 'cpi-432182')
Task 96 Started Thu Sep 22 08:09:19 UTC 2022
Task 96 Finished Thu Sep 22 11:31:57 UTC 2022
Task 96 Duration 03:22:38
Task 96 error
Updating deployment:
Expected task '96' to succeed but state is 'error'
Exit code 1
===== 2022-09-22 11:31:57 UTC Finished "/usr/local/bin/bosh --no-color --non-interactive --tty --environment=192.168.2.21 --deployment=cf-e46963be09f30ce93dca deploy --no-redact /var/tempest/workspaces/default/deployments/cf-e46963be09f30ce93dca.yml"; Duration: 12201s; Exit Status: 1
Exited with 1.
Exited with 1.
We have more than 600 orgs/logical segments created in vcenter and deleting those LS during deployment is causing above issue
Stemcell: ubuntu-xenial/621.261 Bosh: 2.10.46-build.541 TAS 2.13.8
Issue description:
While we are trying to scale diego cell count in our TAS env and having network/org churn running in parallel, we are facing below issue due to which deployment is failing. This issue can also be observed during NSXT tile upgrade from ops manager and having network/org churn running in parallel.
We have more than 600 orgs/logical segments created in vcenter and deleting those LS during deployment is causing above issue
Looks like this issue is related to https://github.com/cloudfoundry/bosh-vsphere-cpi-release/pull/332 which was supposed to be fixed in cpi with Bosh: 2.10.46-build.541 .
To Reproduce Steps to reproduce the behavior: This issue can be reproduced while running org/network churn during bosh vm update/creation
CPI Error Log Attached CPI error logs: task_96_cpi.txt Attached bosh director logs: bosh_logs.tgz Attached debug logs: task_96_debug.txt
Expected behavior Any kind of deployment, either TAS or NCP upgrade or diego cell scaling should be successful while org/network churn is running
Screenshots Attached Screenshot: OPSMAN.png ERROR_LOG.png
Release Version & Related Info (please complete the following information):
Additional context
Looks like due to above issue, deployment was running longer than usual and eventually failed with above mentioned error
Attached files:
task_96_cpi.txt
bosh_logs.tgz
task_96_debug.txt