cloudfoundry-attic / bosh-lite

A lite development env for BOSH
Apache License 2.0
319 stars 245 forks source link

bosh cck fails #411

Open Omnipresent opened 7 years ago

Omnipresent commented 7 years ago

I am running bosh-lite on AWS using the vagrant aws plugin. Sometimes I need to stop the instance so I run vagrant halt and to bring it back up I do vagrant up.

After halting and upping an instance there are certain VMs in various deployments that need to be recreated/restarted.

For cf-release I run bosh cck and press option 2 for the following 12 VMS. This gets CF running again.

$ bosh cck
# I press option 2 for following 12 VMs
# Recreate VM for 'postgres_z1/0 (c3a2c93a-76e6-4000-b850-ae66edb633c7)' without waiting for processes to start
# Recreate VM for 'router_z1/0 (0b9528b6-0e75-4d4e-871d-ebf3052824a7)' without waiting for processes to start
# Recreate VM for 'runner_z1/0 (8d556d91-15e1-4fbe-9d8f-bcd14178d371)' without waiting for processes to start
# Recreate VM for 'nats_z1/0 (59bb1c0d-810c-483b-8352-f39ca0e20353)' without waiting for processes to start
# Recreate VM for 'ha_proxy_z1/0 (52dea328-621a-4e52-a7e5-7113e5cc13fc)' without waiting for processes to start
# Recreate VM for 'doppler_z1/0 (3c537dfe-475b-48f6-a9d7-b118e0b61b58)' without waiting for processes to start
# Recreate VM for 'uaa_z1/0 (4d04496e-1287-4460-bd53-8ad88b7ec1ee)' without waiting for processes to start
# Recreate VM for 'etcd_z1/0 (69f46039-c522-4d58-9d87-be39e9399530)' without waiting for processes to start
# Recreate VM for 'blobstore_z1/0 (382285d4-b1b8-4e53-8fdb-a8ba43a6f8e6)' without waiting for processes to start
# Recreate VM for 'api_z1/0 (09c07831-d6e4-4cf1-8cca-76d05c499471)' without waiting for processes to start
# Recreate VM for 'loggregator_trafficcontroller_z1/0 (be357991-a8ca-439a-9bf8-d4aa63f1861f)' without waiting for processes to start
# Recreate VM for 'hm9000_z1/0 (7d340e16-c4b5-4f95-8ace-fc0dbe78077b)' without waiting for processes to start

I also have cf-mysql-release on this boshlite and I can't seem to be able to run bosh cck on it

$ cd ~/workspace/cf-mysql-release
$ ./scripts/generate-bosh-lite-manifest
Deployment set to '/home/omni/workspace/cf-mysql-release/cf-mysql.yml'
CF-MySQL Manifest was generated at /home/omni/workspace/cf-mysql-release/cf-mysql.yml
$ bosh status
Config
             /home/omni/.bosh_config

Director
  Name       Bosh Lite Director
  URL        https://54.144.35.228:25555
  Version    1.3262.3.0 (00000000)
  User       admin
  UUID       c9ff6012-3899-4540-bca1-77e05f5a32d0
  CPI        warden_cpi
  dns        disabled
  compiled_package_cache disabled
  snapshots  disabled

Deployment
  Manifest   /home/omni/workspace/cf-mysql-release/cf-mysql.yml

$ bosh cck
Acting as user 'admin' on deployment 'cf-warden-mysql' on 'Bosh Lite Director'
Performing cloud check...

Director task 149
Error 100: Unable to get deployment lock, maybe a deployment is in progress. Try again later.

Task 149 error

For a more detailed error report, run: bosh task 149 --debug

bosh vms shows the following vms in unresponsive state. How can I get the VMs recreate/restarted again after donig vagrant halt followed by vagrant up (followed by bosh target <public ip>?

$ bosh vms
Deployment 'cf-warden-mysql'

Director task 152

Task 152 done

+-------------------------------------------------------------+--------------------+-----+--------------------+------------+
| VM                                                          | State              | AZ  | VM Type            | IPs        |
+-------------------------------------------------------------+--------------------+-----+--------------------+------------+
| arbitrator_z3/0 (39066f55-3515-4098-adc5-7bfc7ca6b9d2)      | unresponsive agent | n/a | arbitrator_z3      |            |
| cf-mysql-broker_z1/0 (5632d437-d0b2-4688-93a1-a3a2b4126390) | unresponsive agent | n/a | cf-mysql-broker_z1 |            |
| cf-mysql-broker_z2/0 (0241e416-3bde-48ed-b51e-4d032b14640a) | unresponsive agent | n/a | cf-mysql-broker_z2 |            |
| mysql_z1/0 (a8e4916e-0258-446e-9404-330684e72fab)           | unresponsive agent | n/a | mysql_z1           |            |
| mysql_z2/0 (5bcd3148-0144-4cfe-b7e8-b0ee90abab2e)           | running            | n/a | mysql_z2           | 10.244.8.2 |
| proxy_z1/0 (ea646869-b8e4-4a44-92c2-1cfb25a18b4b)           | unresponsive agent | n/a | proxy_z1           |            |
| proxy_z2/0 (51ebc12c-2b6d-44c3-9f3d-09c049a4e032)           | unresponsive agent | n/a | proxy_z2           |            |
+-------------------------------------------------------------+--------------------+-----+--------------------+------------+
dpb587-pivotal commented 7 years ago

Sorry for the lack of response. In this case, the "Unable to get deployment lock" typically means something else is trying to work on the deployment. I suspect health monitor was jumping in to help bring things back since I notice mysql_z2/0 is already running.

The bosh cck approach is the correct way to get things back up and running. If you run into that lock error, you can do bosh task -a to see what other tasks are running (like health monitor) and reattach to that resurrection task to watch the progress (e.g. bosh task 12345).