cloudfoundry / bosh-azure-cpi-release

BOSH Azure CPI
Apache License 2.0
63 stars 87 forks source link

Error copy stemcell from standard to premium storage: cannot find stemcell in table #213

Closed dsboulder closed 8 years ago

dsboulder commented 8 years ago

I tried to deploy recently with premium storage, and got the first issue listed below. I'm 100% sure that all the tables and containers were precreated in the storage accounts. When I retried immediately after, it worked the 2nd time, so this is probably a race condition.

  Started creating missing vms
  Started creating missing vms > consul_server/0 (ae2a5984-cfa1-453e-921f-37c2c61ab3c2)
  Started creating missing vms > nats/0 (6025181e-fd62-48c9-a470-bcfccebe19bf)
  Started creating missing vms > etcd_server/0 (df111ba7-6163-4652-8f11-f721c6ea0bc0)
  Started creating missing vms > nfs_server/0 (6333e37a-da76-48c1-a849-8df87fc93886)
  Started creating missing vms > mysql_proxy/0 (76feab28-6fa3-4602-8e81-1991b142cd4c)
  Started creating missing vms > mysql/0 (a8e0e157-8a5d-4329-bf5f-edd1859d1a93)
  Started creating missing vms > uaa/0 (760577e5-d943-478d-8b2c-a60effefe208)
  Started creating missing vms > cloud_controller/0 (7ea8177f-5d76-41ef-a53b-52871ef869fc)
  Started creating missing vms > ha_proxy/0 (b276280e-8b56-484d-8c19-76b782b4ad19)
  Started creating missing vms > router/0 (22220dc8-6ea2-47e9-a555-18ea7841e363)
   Failed creating missing vms > mysql/0 (a8e0e157-8a5d-4329-bf5f-edd1859d1a93): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > uaa/0 (760577e5-d943-478d-8b2c-a60effefe208): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > nats/0 (6025181e-fd62-48c9-a470-bcfccebe19bf): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > mysql_proxy/0 (76feab28-6fa3-4602-8e81-1991b142cd4c): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > cloud_controller/0 (7ea8177f-5d76-41ef-a53b-52871ef869fc): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > ha_proxy/0 (b276280e-8b56-484d-8c19-76b782b4ad19): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:02:52)
   Failed creating missing vms > nfs_server/0 (6333e37a-da76-48c1-a849-8df87fc93886): Unknown CPI error 'Unknown' with message 'execution expired' (00:05:09)
   Failed creating missing vms > etcd_server/0 (df111ba7-6163-4652-8f11-f721c6ea0bc0): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:07:53)
   Failed creating missing vms > consul_server/0 (ae2a5984-cfa1-453e-921f-37c2c61ab3c2): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:07:53)
   Failed creating missing vms > router/0 (22220dc8-6ea2-47e9-a555-18ea7841e363): Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21 (00:07:53)

Error 100: Cannot find the stemcell bosh-stemcell-c288fbf0-3fc5-459c-996f-5910f71bdb5a in the table stemcells in the default storage account dsomsept21

Here's when it worked next time:

  Started creating missing vms
  Started creating missing vms > consul_server/0 (ae2a5984-cfa1-453e-921f-37c2c61ab3c2)
  Started creating missing vms > nats/0 (6025181e-fd62-48c9-a470-bcfccebe19bf)
  Started creating missing vms > etcd_server/0 (df111ba7-6163-4652-8f11-f721c6ea0bc0)
  Started creating missing vms > nfs_server/0 (6333e37a-da76-48c1-a849-8df87fc93886)
  Started creating missing vms > mysql_proxy/0 (76feab28-6fa3-4602-8e81-1991b142cd4c)
  Started creating missing vms > mysql/0 (a8e0e157-8a5d-4329-bf5f-edd1859d1a93)
  Started creating missing vms > uaa/0 (760577e5-d943-478d-8b2c-a60effefe208)
  Started creating missing vms > cloud_controller/0 (7ea8177f-5d76-41ef-a53b-52871ef869fc)
  Started creating missing vms > router/0 (22220dc8-6ea2-47e9-a555-18ea7841e363)
  Started creating missing vms > ha_proxy/0 (b276280e-8b56-484d-8c19-76b782b4ad19)
     Done creating missing vms > mysql/0 (a8e0e157-8a5d-4329-bf5f-edd1859d1a93) (00:04:59)
  Started creating missing vms > mysql_monitor/0 (6e089a92-f478-4337-a337-c4dfb558525d)
     Done creating missing vms > nats/0 (6025181e-fd62-48c9-a470-bcfccebe19bf) (00:05:19)
  Started creating missing vms > clock_global/0 (f6fb6f10-6eab-4487-9be0-9d219a410a58)
     Done creating missing vms > consul_server/0 (ae2a5984-cfa1-453e-921f-37c2c61ab3c2) (00:05:27)
  Started creating missing vms > cloud_controller_worker/0 (f8f9d263-0f4e-4878-808d-378f76410f7a)
     Done creating missing vms > uaa/0 (760577e5-d943-478d-8b2c-a60effefe208) (00:05:33)
  Started creating missing vms > diego_database/0 (251c0a1a-0008-45bd-a6c0-97b3b059e2e0)
     Done creating missing vms > nfs_server/0 (6333e37a-da76-48c1-a849-8df87fc93886) (00:05:33)
  Started creating missing vms > diego_brain/0 (0aceb56c-ad2c-48f1-bdb1-9c1ea48f4bbc)
     Done creating missing vms > etcd_server/0 (df111ba7-6163-4652-8f11-f721c6ea0bc0) (00:05:34)
  Started creating missing vms > diego_cell/0 (2d1ba614-cc64-4893-926c-5d6e3ccfcca3)
     Done creating missing vms > router/0 (22220dc8-6ea2-47e9-a555-18ea7841e363) (00:05:42)
  Started creating missing vms > diego_cell/1 (3c83ac46-351b-4872-9377-6971d1363425)
     Done creating missing vms > cloud_controller/0 (7ea8177f-5d76-41ef-a53b-52871ef869fc) (00:05:44)
  Started creating missing vms > doppler/0 (8973151f-27df-4a5c-b31d-c667639dc3d3)
     Done creating missing vms > mysql_proxy/0 (76feab28-6fa3-4602-8e81-1991b142cd4c) (00:05:49)
  Started creating missing vms > loggregator_trafficcontroller/0 (477b993c-30b1-4cbd-9eda-3668a6543261)
     Done creating missing vms > ha_proxy/0 (b276280e-8b56-484d-8c19-76b782b4ad19) (00:06:04)
     Done creating missing vms > mysql_monitor/0 (6e089a92-f478-4337-a337-c4dfb558525d) (00:02:05)
     Done creating missing vms > diego_brain/0 (0aceb56c-ad2c-48f1-bdb1-9c1ea48f4bbc) (00:01:48)
...
jastev commented 8 years ago

Are those VMs being spun up essentially simultaneously, all from an image in the same premium storage account container? Is the table in the same account as the disk image?

dsboulder commented 8 years ago

Yes, BOSH brings them up in parallel, 10 at a time. The CPI copies from the standard => premium the first time, and it's supposed to make the other 9 threads wait till the copy is done.

AbelHu commented 8 years ago

@dsboulder You can see the error Failed creating missing vms > nfs_server/0 (6333e37a-da76-48c1-a849-8df87fc93886): Unknown CPI error 'Unknown' with message 'execution expired' (00:05:09). Could you send the debug log with the error detail to me?

dsboulder commented 8 years ago

I sent the debug logs in an email to you. Thanks!

AbelHu commented 8 years ago

@dsboulder With the help of support team, we find that that storage account psomsept21a failed to handle requests because of 'client errors'. Support team wants to get the name of the BOSH director VM for advanced investigation. Could you provide it?

AbelHu commented 8 years ago

@dsboulder Could you provide the name of the BOSH director VM? I need to reply support team.

AbelHu commented 8 years ago

@dsboulder Azure support team found that this was caused by client errors. But without the information in the BOSH director VM, they cannot investigate this issue. Close it for now. Please reactive it if you can hit it again.