cloudfoundry-attic / bosh-lite

A lite development env for BOSH
Apache License 2.0
319 stars 244 forks source link

BOSH cck fails when attempting to recreate containers #353

Open ryantang opened 8 years ago

ryantang commented 8 years ago

https://github.com/cloudfoundry/bosh-lite/blob/master/docs/bosh-cck.md

The above instructions no longer works to recover bosh-lite containers upon reboot of the host (AWS EC2 instance in this case). The strange thing for my scenario is that bosh cck fails but one of the containers is fixed upon completion of that command. A subsequent bosh deploy adds the two additional missing containers.

ubuntu@agent-id-bosh-0:~$ bosh -d etcd-manifest.yml cck

<skipped some output> 

Scan is complete, checking if any problems found...

Found 3 problems

Problem 1 of 3: etcd_z1/2 (ed696055-75b8-4449-8807-fb1d0d091538) (c6396aee-ce0b-4dc0-44b6-d96f67506dff) is not responding.
  1. Skip for now
  2. Reboot VM
  3. Recreate VM for 'etcd_z1/2 (ed696055-75b8-4449-8807-fb1d0d091538)'
  4. Delete VM reference (forceful; may need to manually delete VM from the Cloud to avoid IP conflicts)
Please choose a resolution [1 - 4]: 3

<skipped more output; chose "3" or "recreate" for all VMs>

Director task 24
  Started applying problem resolutions
  Started applying problem resolutions > unresponsive_agent 3: Recreate VM for 'etcd_z1/2 (ed696055-75b8-4449-8807-fb1d0d091538)'. Failed: Creating VM with agent ID '5eda1ec1-3f7c-4e9b-875e-73a9d75209d0': Creating container: container: start: exit status 2 (00:00:11)
  Started applying problem resolutions > unresponsive_agent 1: Recreate VM for 'etcd_z1/0 (783a020c-f5d5-449e-b0f4-e48f8c74dfa9)'. Failed: Creating VM with agent ID '4a51a656-07e3-4c42-89e3-69faad4b205c': Creating container: container: start: exit status 2 (00:00:11)
  Started applying problem resolutions > unresponsive_agent 2: Recreate VM for 'etcd_z1/1 (e4ecbcb5-4ecd-488a-98a7-ad51c1ed6bf8)'. Done (00:00:52)
   Failed applying problem resolutions (00:01:14)

Error 100: Error resolving problem `7': Creating VM with agent ID '5eda1ec1-3f7c-4e9b-875e-73a9d75209d0': Creating container: container: start: exit status 2
Error resolving problem `8': Creating VM with agent ID '4a51a656-07e3-4c42-89e3-69faad4b205c': Creating container: container: start: exit status 2

Task 24 error

A snippet from the debug log:

I, [2016-03-23 16:29:57 #2526] [task:24]  INFO -- DirectorJobRunner: Creating VM
D, [2016-03-23 16:29:57 #2526] [task:24] DEBUG -- DirectorJobRunner: External CPI sending request: {"method":"create_vm","arguments":["4a51a656-07e3-4c42-89e3-69faad4b205c","9ff9ea15-73e5-467b-7fef-67d079850c1e",{},{"etcd1":{"ip":"10.244.4.4","netmask":"255.255.255.0","cloud_properties":{"name":"random"},"default":["dns","gateway"],"gateway":"10.244.4.1"}},["dac4223b-97c2-4031-6bb5-8718c908485f"],{}],"context":{"director_uuid":"60e5fa30-1059-4a03-8d85-351fde555719"}} with command: /var/vcap/jobs/warden_cpi/bin/cpi
D, [2016-03-23 16:29:58 #2526] [task:24] DEBUG -- DirectorJobRunner: External CPI got response: {"result":null,"error":{"type":"Bosh::Clouds::CloudError","message":"Creating VM with agent ID '4a51a656-07e3-4c42-89e3-69faad4b205c': Creating container: container: start: exit status 2","ok_to_retry":false},"log":""}, err: , exit_status: pid 3077 exit 0
E, [2016-03-23 16:29:58 #2526] [task:24] ERROR -- DirectorJobRunner: error creating vm: Creating VM with agent ID '4a51a656-07e3-4c42-89e3-69faad4b205c': Creating container: container: start: exit status 2
E, [2016-03-23 16:29:58 #2526] [task:24] ERROR -- DirectorJobRunner: Error resolving problem `8': Creating VM with agent ID '4a51a656-07e3-4c42-89e3-69faad4b205c': Creating container: container: start: exit status 2
E, [2016-03-23 16:29:58 #2526] [task:24] ERROR -- DirectorJobRunner: /var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3215.0/lib/cloud/external_cpi.rb:119:in `handle_error'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3215.0/lib/cloud/external_cpi.rb:88:in `invoke_cpi_method'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh_cpi-1.3215.0/lib/cloud/external_cpi.rb:51:in `create_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/vm_creator.rb:118:in `create'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/vm_creator.rb:53:in `create_for_instance_plan'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/cloudcheck_helper.rb:62:in `recreate_vm'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/problem_handlers/unresponsive_agent.rb:37:in `block (2 levels) in <class:UnresponsiveAgent>'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/problem_handlers/base.rb:83:in `instance_eval'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/problem_handlers/base.rb:83:in `apply_resolution'
/var/vcap/packages/director/gem_home/ruby/2.1.0/gems/bosh-director-1.3215.0/lib/bosh/director/problem_resolver.rb:65:in `block in apply_resolution'
ryantang commented 8 years ago

I am using bosh-stemcell-3147-warden-boshlite-ubuntu-trusty-go_agent.

bosh-cpi-robot commented 8 years ago

Can you add garden logs from this failure if you still have it? I suspect garden-linux release is doing something different now.

On Wed, Mar 23, 2016 at 10:12 AM, Ryan Tang notifications@github.com wrote:

I am using bosh-stemcell-3147-warden-boshlite-ubuntu-trusty-go_agent.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/bosh-lite/issues/353#issuecomment-200446674

ryantang commented 8 years ago

I re-ran the bosh cck command. Here's the info form garden.stdout.log. GUIDs will likely be a bit different.

This the part of the logs from the first recreate failure. Please let me know if you need more of the log file.

# tail -f ./vcap/data/sys/log/garden/garden.std*.log ./vcap/data/sys/log/monit/garden.log

==> ./vcap/data/sys/log/garden/garden.stdout.log <==
{"timestamp":"1458754867.144699812","source":"garden-linux","message":"garden-linux.container.cleanup.done","log_level":1,"data":{"handle":"0ddb9a4b-c740-4acf-4b68-bf643174e5e4","session":"18.2"}}
{"timestamp":"1458754867.144761324","source":"garden-linux","message":"garden-linux.pool.release.releasing","log_level":1,"data":{"id":"fk5rhq2q9r5","session":"9.4"}}
{"timestamp":"1458754867.277828693","source":"garden-linux","message":"garden-linux.pool.release.subnet-pool.release.changing-allocated-ips","log_level":1,"data":{"allocated-ips":["10.244.4.5","10.244.4.6"],"id":"fk5rhq2q9r5","session":"9.4.2.1"}}
{"timestamp":"1458754867.277896404","source":"garden-linux","message":"garden-linux.pool.release.released","log_level":1,"data":{"id":"fk5rhq2q9r5","session":"9.4"}}
{"timestamp":"1458754867.277940035","source":"garden-linux","message":"garden-linux.garden-server.destroy.destroyed","log_level":1,"data":{"handle":"0ddb9a4b-c740-4acf-4b68-bf643174e5e4","session":"11.5"}}
{"timestamp":"1458754867.351434708","source":"garden-linux","message":"garden-linux.pool.acquire.creating","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5"}}
{"timestamp":"1458754867.351520538","source":"garden-linux","message":"garden-linux.pool.acquire.subnet-pool.acquire.subnet-selecting","log_level":1,"data":{"allocated-subnets":["10.244.4.0/24"],"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5.1.1"}}
{"timestamp":"1458754867.351566553","source":"garden-linux","message":"garden-linux.pool.acquire.subnet-pool.acquire.subnet-selected","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5.1.1","subnet":"10.244.4.0/24"}}
{"timestamp":"1458754867.351622343","source":"garden-linux","message":"garden-linux.pool.acquire.subnet-pool.acquire.ip-selecting","log_level":1,"data":{"allocated-ips":["10.244.4.5","10.244.4.6"],"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5.1.1"}}
{"timestamp":"1458754867.351675749","source":"garden-linux","message":"garden-linux.pool.acquire.subnet-pool.acquire.ip-selected","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","ip":"10.244.4.4","session":"9.5.1.1"}}
{"timestamp":"1458754867.351738691","source":"garden-linux","message":"garden-linux.pool.acquire.subnet-pool.acquire.new-allocated","log_level":1,"data":{"allocated-ips":["10.244.4.5","10.244.4.6","10.244.4.4"],"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5.1.1"}}
{"timestamp":"1458754867.351773262","source":"garden-linux","message":"garden-linux.pool.acquire.acquired-pool-resources","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5"}}
{"timestamp":"1458754867.352807522","source":"garden-linux","message":"garden-linux.pool.acquire.running-graph-cleanup","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5"}}
{"timestamp":"1458754867.353132248","source":"garden-linux","message":"garden-linux.pool.acquire.gc.start","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5.2"}}
{"timestamp":"1458754867.643604040","source":"garden-linux","message":"garden-linux.pool.acquire.created","log_level":1,"data":{"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"9.5"}}
{"timestamp":"1458754867.827059984","source":"garden-linux","message":"garden-linux.container.start.command.failed","log_level":2,"data":{"argv":["/var/vcap/data/garden/depot/fk9jojisg5i/start.sh"],"error":"exit status 2","exit-status":2,"handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"23.1.1","stderr":"Failed to create container: containerizer: run `parent-after-clone`: exit status 2. stderr: panic: network: failed to find bridge with name 'wb-fk5rhq2pe5m0', IP '10.244.4.1', subnet '10.244.4.0/24': look up existing bridge\n\ngoroutine 1 [running]:\ngithub.com/cloudfoundry-incubator/garden-linux/linux_backend.must(0x7fbdfea00368, 0xc2080325c0)\n\t/var/vcap/packages/garden-linux/src/github.com/cloudfoundry-incubator/garden-linux/linux_backend/hooks.go:95 +0x50\ngithub.com/cloudfoundry-incubator/garden-linux/linux_backend.func·002()\n\t/var/vcap/packages/garden-linux/src/github.com/cloudfoundry-incubator/garden-linux/linux_backend/hooks.go:28 +0xf5\ngithub.com/cloudfoundry-incubator/garden-linux/hook.HookSet.Main(0xc20803abd0, 0x7ffc5e42bd18, 0x12)\n\t/var/vcap/packages/garden-linux/src/github.com/cloudfoundry-incubator/garden-linux/hook/hooks.go:28 +0x81\ngithub.com/cloudfoundry-incubator/garden-linux/hook.Main(0xc20800a010, 0x1, 0x1)\n\t/var/vcap/packages/garden-linux/src/github.com/cloudfoundry-incubator/garden-linux/hook/hooks.go:19 +0x46\nmain.main()\n\t/var/vcap/packages/garden-linux/src/github.com/cloudfoundry-incubator/garden-linux/hook/hook/main.go:39 +0x4c2\n","stdout":"","took":"136.461385ms"}}
{"timestamp":"1458754867.827219963","source":"garden-linux","message":"garden-linux.container.start.wshd-start-failed","log_level":2,"data":{"error":"exit status 2","handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","session":"23.1"}}
{"timestamp":"1458754867.827310324","source":"garden-linux","message":"garden-linux.pool.release.releasing","log_level":1,"data":{"id":"fk9jojisg5i","session":"9.6"}}
{"timestamp":"1458754867.954038620","source":"garden-linux","message":"garden-linux.pool.release.subnet-pool.release.changing-allocated-ips","log_level":1,"data":{"allocated-ips":["10.244.4.5","10.244.4.6"],"id":"fk9jojisg5i","session":"9.6.2.1"}}
{"timestamp":"1458754867.954099178","source":"garden-linux","message":"garden-linux.pool.release.released","log_level":1,"data":{"id":"fk9jojisg5i","session":"9.6"}}
{"timestamp":"1458754867.954126835","source":"garden-linux","message":"garden-linux.garden-server.create.failed","log_level":2,"data":{"error":"container: start: exit status 2","request":{"Handle":"8c25a9d3-4a5c-480a-4539-24ce4d379235","GraceTime":0,"RootFSPath":"/var/vcap/store/warden_cpi/stemcells/9ff9ea15-73e5-467b-7fef-67d079850c1e","BindMounts":[{"src_path":"/var/vcap/store/warden_cpi/ephemeral_bind_mounts_dir/8c25a9d3-4a5c-480a-4539-24ce4d379235","dst_path":"/var/vcap/data","mode":1},{"src_path":"/var/vcap/store/warden_cpi/persistent_bind_mounts_dir/8c25a9d3-4a5c-480a-4539-24ce4d379235","dst_path":"/warden-cpi-dev","mode":1}],"Network":"10.244.4.4/24","Privileged":true,"Limits":{"bandwidth_limits":{},"cpu_limits":{},"disk_limits":{},"memory_limits":{}}},"session":"11.6"}}