cloudfoundry / bosh-openstack-cpi-release

BOSH OpenStack CPI
Apache License 2.0
36 stars 59 forks source link

Re-deploy failed in recreating VMs #18

Closed guoger closed 8 years ago

guoger commented 8 years ago

To reproduce:

  1. Prepare a OpenStack (nova network) environment
  2. Do a normal deploy
  3. Change something that should trigger VM recreation (changes on network, stemcell, env keys, etc.)
  4. Deploy again

We get error like this:

D, [2016-01-11 11:47:43 #8012] [task:94] DEBUG -- DirectorJobRunner: SENT: hm.director.alert {"id":"bbb6cf28-f729-484d-b36a-97c221caad0c","severity":3,"source":"director","title":"director - error during update deployment","summary":"Error during update deployment for 'bat' against Director '2d861c69-3606-4922-b180-765d90f061c1': #<Bosh::Clouds::CloudError: OpenStack API Bad Request (Fixed IP address 10.11.13.21 is already in use on instance 479e01ef-bd41-454d-a63f-d310bf3c6e0c.). Check task debug log for details.>","created_at":1452512863}
E, [2016-01-11 11:47:43 #8012] [task:94] ERROR -- DirectorJobRunner: OpenStack API Bad Request (Fixed IP address 10.11.13.21 is already in use on instance 479e01ef-bd41-454d-a63f-d310bf3c6e0c.). Check task debug log for details.
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.0000.0/lib/cloud/external_cpi.rb:118:in `handle_error'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.0000.0/lib/cloud/external_cpi.rb:88:in `invoke_cpi_method'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.0000.0/lib/cloud/external_cpi.rb:51:in `create_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/vm_creator.rb:110:in `create'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/vm_creator.rb:51:in `create_for_instance_plan'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/vm_recreator.rb:13:in `recreate_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/instance_updater.rb:78:in `update'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/job_updater.rb:105:in `block (2 levels) in update_canary_instance'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.0000.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/job_updater.rb:103:in `block in update_canary_instance'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/event_log.rb:97:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/event_log.rb:97:in `advance_and_track'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/job_updater.rb:102:in `update_canary_instance'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.0000.0/lib/bosh/director/job_updater.rb:97:in `block (2 levels) in update_canaries'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.0000.0/lib/common/thread_pool.rb:77:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.0000.0/lib/common/thread_pool.rb:77:in `block (2 levels) in create_thread'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.0000.0/lib/common/thread_pool.rb:63:in `loop'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_common-1.0000.0/lib/common/thread_pool.rb:63:in `block in create_thread'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'

We are using devstack to provision an OpenStack environment and this is our local.conf

[[local|localrc]]
SERVICE_TOKEN=token
ADMIN_PASSWORD=password
MYSQL_PASSWORD=password
RABBIT_PASSWORD=password
SERVICE_PASSWORD=$ADMIN_PASSWORD
HOST_IP=W.X.Y.Z
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
SWIFT_HASH=66a3d6b56c1f479c8b4e70ab5c2000f5
SWIFT_REPLICAS=1
SWIFT_DATA_DIR=$DEST/data
FLOATING_RANGE=W.X.Y.Z/27
FIXED_RANGE=10.11.13.0/24
FIXED_NETWORK_SIZE=256
FLAT_INTERFACE=eth0
VOLUME_BACKING_FILE_SIZE=409600M
cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/111477500.

guoger commented 8 years ago

@voelzmo we have constantly encountered this issue and we wonder whether you have seen this before. @cppforlife thought it should be misconfigured OpenStack env. We are using old format of manifest (non-cloud-config) and nova network. We will try on neutron network some time later.

Cheers, @xingzhou and Jay

voelzmo commented 8 years ago

@guoger that looks interesting. I haven't used nova-network at all, so probably it is related to that. I'll try to reproduce it using DevStack. On configurations using neutron networking I haven't seen this before so far.

Do you have some more information on the versions of bosh and openstack-cpi that you use?

guoger commented 8 years ago

@voelzmo we are using openstack-cpi-release-11. And BOSH director is based on recent develop branch, I don't know exact commit hash but nothing earlier than last thursday. Thanks!

voelzmo commented 8 years ago

@guoger Sorry it took me so long to get back to you. I couldn't reproduce your issue, in my devstack setup recreating VMs works nicely when changing properties.

Not sure if this is still relevant for you, but which version of devstack are you using? I suggest not to use master but switch to a stable branch, i.e. stable/mitaka or stable/liberty. Did you re-try with neutron enabled as you stated above?

guoger commented 8 years ago

@voelzmo Thanks for your reply! We are running Juno and we encountered IP conflicts in other circumstances as well, so I guess it's not an OpenStack-CPI issue. Unfortunately we are not actively working on BOSH anymore. As for now, I suggest you could close this issue unless the problem emerges again.

Sorry I didn't get back to you sooner.

alfredcs commented 8 years ago

We ran into the same issue with OpenStack Liberty. The bosh deploy always tries to use the second IP in the allocated subnet which by default has been used by Neutron dhcp. Should bosh be able to pick an available one from the pool? More importantly how to mitigate this?

Deploying

Are you sure you want to deploy? (type 'yes' to continue): yes

Director task 70 Started preparing deployment > Preparing deployment. Done (00:00:01)

Started preparing package compilation > Finding packages to compile. Done (00:00:00)

Started compiling packages Started compiling packages > cli/0f19a1df7ef777208cc3bfcdc875b2b56613ab40 Started compiling packages > rootfs_cflinuxfs2/021d92c7dced31395ab2417e9ddfc403b2f5d499 Started compiling packages > buildpack_java_offline/63c5bce1007c7d62586c29348a9312d549b7532b Started compiling packages > buildpack_java/6e04d5bc2fbe53275a400af768d1504a2d85a50f Started compiling packages > nginx/a2d65c6ae8076abde17bc53bb3e5fb0577078ab5 Started compiling packages > ruby-2.2.4/dd1b827e6ea0ca7e9fcb95d08ae81fb82f035261 Done compiling packages > cli/0f19a1df7ef777208cc3bfcdc875b2b56613ab40 (00:00:40) Started compiling packages > libpq/36290c252abdd5750583eca92a8c4fd134594fa8 Done compiling packages > buildpack_java/6e04d5bc2fbe53275a400af768d1504a2d85a50f (00:00:41) Started compiling packages > libmariadb/90d7836132e02bb570d91f08c9e116c394888d75. Done (00:00:10) Started compiling packages > staticfile-buildpack/d4e97e849ddf228ee520a2babaafb852738d56b9. Done (00:00:04) Started compiling packages > python-buildpack/4a16c3a69a632fd7f44ab92fd3e34fc97208ad8e Done compiling packages > rootfs_cflinuxfs2/021d92c7dced31395ab2417e9ddfc403b2f5d499 (00:01:07) Started compiling packages > php-buildpack/42ef7896ed16a96505149665ae78c1adfe96090c Done compiling packages > buildpack_java_offline/63c5bce1007c7d62586c29348a9312d549b7532b (00:01:10) Started compiling packages > ruby-buildpack/66cf71501e4a88fbb13698f678788569f37d4272 Done compiling packages > libpq/36290c252abdd5750583eca92a8c4fd134594fa8 (00:00:33) Started compiling packages > nodejs-buildpack/07452ac5f8d32d521954869042d0115fe06e6ef7 Done compiling packages > nginx/a2d65c6ae8076abde17bc53bb3e5fb0577078ab5 (00:01:23) Started compiling packages > binary-buildpack/86f697c3f0ccfae1141c57390988f67f6825dcaf Done compiling packages > python-buildpack/4a16c3a69a632fd7f44ab92fd3e34fc97208ad8e (00:00:28) Started compiling packages > go-buildpack/1e92a0887216366ff6950f533e4db027527d11e4 Done compiling packages > nodejs-buildpack/07452ac5f8d32d521954869042d0115fe06e6ef7 (00:00:16) Started compiling packages > uaa/b3de45c0cde3f1e1e25e1d6cb93c431c9fdf9d9c Done compiling packages > binary-buildpack/86f697c3f0ccfae1141c57390988f67f6825dcaf (00:00:06) Started compiling packages > uaa_utils/fc2e416c5af94d6293ac70bb594c984e31097fb7. Done (00:00:05) Started compiling packages > postgres-9.4.6/4ca292f2443cfc682e87632ebd15fb3f3fa53123 Done compiling packages > php-buildpack/42ef7896ed16a96505149665ae78c1adfe96090c (00:00:34) Started compiling packages > golang1.5/73d2dfd26bf1136e2cd3d460a1043668dca5725a Done compiling packages > ruby-buildpack/66cf71501e4a88fbb13698f678788569f37d4272 (00:00:33) Started compiling packages > nginx_webdav/96158b0a628398efe9cc28ee5bc5e496da5e09b7 Done compiling packages > uaa/b3de45c0cde3f1e1e25e1d6cb93c431c9fdf9d9c (00:00:24) Started compiling packages > debian_nfs_server/aac05f22582b2f9faa6840da056084ed15772594. Done (00:00:05) Started compiling packages > capi_utils/320882c73c7381f4e7890aff88589fa3a9b6c4b2. Done (00:00:04) Started compiling packages > etcd-common/a5492fb0ad41a80d2fa083172c0430073213a296 Done compiling packages > golang1.5/73d2dfd26bf1136e2cd3d460a1043668dca5725a (00:00:27) Started compiling packages > ruby-2.1.8/b5bf6af82bae947ad255e426001308acfc2244ee Done compiling packages > etcd-common/a5492fb0ad41a80d2fa083172c0430073213a296 (00:00:06) Started compiling packages > haproxy/f5d89b125a66892628a8cd61d23be7f9b0d31171 Done compiling packages > go-buildpack/1e92a0887216366ff6950f533e4db027527d11e4 (00:00:50) Started compiling packages > common/953ccbc1b39fc972d2b4903beb16af7daf78cc85. Done (00:00:04) Started compiling packages > loggregator_common/6e89e9e4c155576dfc47b5959887807831ddf2df. Done (00:00:04) Started compiling packages > golang1.6/85a489b7c0c2584aa9e0a6dd83666db31c6fc8e8 Done compiling packages > nginx_webdav/96158b0a628398efe9cc28ee5bc5e496da5e09b7 (00:00:50) Started compiling packages > consul/14b83378b30a2b55a25e641e835af3e5c87a0d41. Done (00:00:06) Started compiling packages > consul-common/ffab9ae7bea8a053aacca8816681e241b0fab30b. Done (00:00:06) Started compiling packages > smoke-tests/eee5b25b700cf12d6905d2c94a266220c83f9df6 Done compiling packages > golang1.6/85a489b7c0c2584aa9e0a6dd83666db31c6fc8e8 (00:00:28) Started compiling packages > gorouter/2b2606161fb5c73d259f074e4142c63852830412 Done compiling packages > smoke-tests/eee5b25b700cf12d6905d2c94a266220c83f9df6 (00:00:04) Started compiling packages > route_registrar/a471cd246893763768b0ed9b9088d27572f3916f. Done (00:00:12) Started compiling packages > acceptance-tests/bf02cb2196218e7f91e0d03101347882abf74fa7 Done compiling packages > haproxy/f5d89b125a66892628a8cd61d23be7f9b0d31171 (00:00:56) Started compiling packages > loggregator_trafficcontroller/038175d95748933723834ea037f2f2b9fa8e2f8a Done compiling packages > gorouter/2b2606161fb5c73d259f074e4142c63852830412 (00:00:21) Started compiling packages > syslog_drain_binder/c16f0ed931d3b4932891ee79094c42bc443db38e Done compiling packages > acceptance-tests/bf02cb2196218e7f91e0d03101347882abf74fa7 (00:00:14) Started compiling packages > doppler/e1baac7abbea4267620e84a721761bad87a5cd2d Done compiling packages > loggregator_trafficcontroller/038175d95748933723834ea037f2f2b9fa8e2f8a (00:00:23) Started compiling packages > dea_logging_agent/9c045f99b19e8d87ea0d6ad6b0cc4fb3ec6eb8d1 Done compiling packages > syslog_drain_binder/c16f0ed931d3b4932891ee79094c42bc443db38e (00:00:25) Started compiling packages > hm9000/1b7309278cc5b3ee3124411e799859443e69da11 Done compiling packages > doppler/e1baac7abbea4267620e84a721761bad87a5cd2d (00:00:23) Started compiling packages > statsd-injector/3f082441e4ce31be7c0a162ffd44306f07d4a625 Done compiling packages > dea_logging_agent/9c045f99b19e8d87ea0d6ad6b0cc4fb3ec6eb8d1 (00:00:16) Started compiling packages > blobstore_url_signer/09ff302425c2edda6bc8c6275585cc86acffa473 Done compiling packages > statsd-injector/3f082441e4ce31be7c0a162ffd44306f07d4a625 (00:00:14) Started compiling packages > etcd_metrics_server/5795fac9a4e5c60b10d4c83aa40fb524fa64b9ed Done compiling packages > blobstore_url_signer/09ff302425c2edda6bc8c6275585cc86acffa473 (00:00:13) Started compiling packages > etcd/015a370c4578c19dd3578329477b1f1a549f0c60 Done compiling packages > hm9000/1b7309278cc5b3ee3124411e799859443e69da11 (00:00:26) Started compiling packages > gnatsd/f1af48ea824ef64c751da0c8fa4c9660c4cb9cdd Done compiling packages > etcd_metrics_server/5795fac9a4e5c60b10d4c83aa40fb524fa64b9ed (00:00:17) Started compiling packages > metron_agent/0bb15d6f347fa91ca9840ac69e5298da5c4e1af3 Done compiling packages > gnatsd/f1af48ea824ef64c751da0c8fa4c9660c4cb9cdd (00:00:14) Started compiling packages > confab/2256cb5bc15bea04aa36c1bd9cf6d4698038262d Done compiling packages > metron_agent/0bb15d6f347fa91ca9840ac69e5298da5c4e1af3 (00:00:24) Done compiling packages > confab/2256cb5bc15bea04aa36c1bd9cf6d4698038262d (00:00:29) Done compiling packages > etcd/015a370c4578c19dd3578329477b1f1a549f0c60 (00:00:58) Done compiling packages > postgres-9.4.6/4ca292f2443cfc682e87632ebd15fb3f3fa53123 (00:03:37) Done compiling packages > ruby-2.2.4/dd1b827e6ea0ca7e9fcb95d08ae81fb82f035261 (00:05:19) Started compiling packages > warden/45feaaafb1ed6e77fd1ddd067b2577b8d13c03c7 Started compiling packages > dea_next/8206d887c9852387e0d57ca4cfb790d5abe87178 Started compiling packages > cloud_controller_ng/a4420e3a7cd1d3c9704709258eae0a6e23dd101b Started compiling packages > nginx_newrelic_plugin/f8e9f7fc5988394c793036ea7c1d9ca8ce3de78e. Done (00:00:12) Done compiling packages > warden/45feaaafb1ed6e77fd1ddd067b2577b8d13c03c7 (00:00:27) Done compiling packages > dea_next/8206d887c9852387e0d57ca4cfb790d5abe87178 (00:01:07) Done compiling packages > ruby-2.1.8/b5bf6af82bae947ad255e426001308acfc2244ee (00:04:26) Started compiling packages > collector/9f8dfbcbcfffb124820327ad2ad4fee35e51d236 Started compiling packages > nats/2230720d1021af6c2c90cd7f3983264ab351043b. Done (00:00:17) Done compiling packages > collector/9f8dfbcbcfffb124820327ad2ad4fee35e51d236 (00:00:30) Done compiling packages > cloud_controller_ng/a4420e3a7cd1d3c9704709258eae0a6e23dd101b (00:02:00) Done compiling packages (00:07:19)

Started creating missing vms Started creating missing vms > ha_proxy_z1/0 (f6a428cf-32fa-44ca-a64e-35444a847e55) Started creating missing vms > consul_z1/0 (72619db5-5b51-4824-90b6-98be8866d34c) Started creating missing vms > nats_z1/0 (29cc09e8-b9bb-4bbb-80f4-01a49479cda7) Done creating missing vms > consul_z1/0 (72619db5-5b51-4824-90b6-98be8866d34c) (00:00:39) Started creating missing vms > etcd_z1/0 (4c4f8766-7f65-4e02-bd2f-da7ad734fc00) Done creating missing vms > nats_z1/0 (29cc09e8-b9bb-4bbb-80f4-01a49479cda7) (00:00:39) Started creating missing vms > stats_z1/0 (61578796-572e-42c2-a0c7-426073261db9) Done creating missing vms > ha_proxy_z1/0 (f6a428cf-32fa-44ca-a64e-35444a847e55) (00:00:39) Started creating missing vms > blobstore_z1/0 (d4de74b3-8679-4575-9b4c-50b4931cf728) Failed creating missing vms > stats_z1/0 (61578796-572e-42c2-a0c7-426073261db9): OpenStack API Bad Request (Fixed IP address 10.10.2.2 is already in use on instance dhcp487d6daa-08a3-5ee8-87b1-5bf158f48408-fd9bc95d-fe08-488a-ad8e-2e2f3c965c11.). Check task debug log for details. (00:00:03) Failed creating missing vms > blobstore_z1/0 (d4de74b3-8679-4575-9b4c-50b4931cf728): OpenStack API Bad Request (Fixed IP address 10.10.2.3 is already in use on instance c19f9cd4-0ec6-4007-bea4-ae41417f6db5.). Check task debug log for details. (00:00:03) Done creating missing vms > etcd_z1/0 (4c4f8766-7f65-4e02-bd2f-da7ad734fc00) (00:00:34)

Error 100: OpenStack API Bad Request (Fixed IP address 10.10.2.2 is already in use on instance dhcp487d6daa-08a3-5ee8-87b1-5bf158f48408-fd9bc95d-fe08-488a-ad8e-2e2f3c965c11.). Check task debug log for details.

Task 70 error

For a more detailed error report, run: bosh task 70 --debug

voelzmo commented 8 years ago

@alfredcs: the error OpenStack API Bad Request (Fixed IP address 10.10.2.2 is already in use on instance dhcp487d6daa-08a3-5ee8-87b1-5bf158f48408-fd9bc95d-fe08-488a-ad8e-2e2f3c965c11.) sounds indeed like that IP address is in use by the DHCP server in that network.

When you are using a network of type: manual, BOSH picks the IP addresses for your deployment, not the OpenStack DHCP server. Therefore, to avoid IP address clashes, you have to mark any IP addresses that are assigned outside of BOSH as reserved.

One way to get rid of this is to make sure to always put the DHCP server's IP into the network's reserved section in your deployment manifest, like so (assuming your network is a 10.10.2.0/24):

networks:
- name: my-network
  type: manual

  subnets:
  - range:    10.10.2.0/24
    gateway:  10.10.2.1
    reserved: [10.10.2.2]

The best way would be to divide your network into a smaller range used for DHCP assignment (e.g. if you want to spin up a VM by hand in that network for testing) and a bigger ranger where BOSH puts deployments in. When creating a subnet with neutron, you can specify a smaller allocation range – by default it is the Then put that whole range into the network's reserved section.

For additional details, have a look at the bosh.io network documentation