cloudfoundry-community / cf-boshworkspace

Deploy Cloud Foundry using bosh-workspace
7 stars 17 forks source link

cf-aws-tiny seems broken (api_z1/0 error) #89

Closed djvdorp closed 8 years ago

djvdorp commented 8 years ago

This is a pretty short issue, my apologies for this but I needed a working dev install of CF now, hence I switched to cf-aws-large for this moments so I have not much more information at this time.

However, I tried a clean install via terraform-aws-cf-install like ~3 separate times and api_z1/0 seems to keep failing, and I wanted to give a heads up about this.

Do you happen to have a quick way to determine what broke and/or which relevant logs to dig up, so I can either help solve this issue or solve this issue on my own?

djvdorp commented 8 years ago

Update: I just re-setup the broken cf-aws-tiny on my account in a different AWS (Geographical) Region so I can debug this matter further.

monit summary:

The Monit daemon 5.2.4 uptime: 23m 

Process 'routing-api'               not monitored
Process 'gorouter'                  running
Process 'cloud_controller_ng'       running
Process 'cloud_controller_worker_local_1' running
Process 'cloud_controller_worker_local_2' running
Process 'nginx_cc'                  running
Process 'cloud_controller_migration' running
Process 'metron_agent'              running
File 'nfs_mounter'                  accessible
Process 'consul_agent'              running
System 'system_1d92b5de-3558-40be-8c89-c4078fcb0777' running

monit status:

The Monit daemon 5.2.4 uptime: 25m

Process 'routing-api'
  status                            not monitored
  monitoring status                 not monitored
  data collected                    Tue Nov 24 12:01:03 2015

routing-api seems to have issues starting. Gathered logs from the node by running this on the bastion:

bosh logs api_z1 0
tar xvfz <file.tar.gz>

The underlying error seems to be found in (extracted logs), file routing-api/routing-api.log:

{"timestamp":"1448371854.290952682","source":"routing-api","message":"routing-api.database","log_level":1,"data":{"etcd-addresses":["http://10.10.3.8:4001"]}}
{"timestamp":"1448371854.291732073","source":"routing-api","message":"routing-api.failed to connect to etcd","log_level":2,"data":{"error":"sync cluster failed"}}

That error makes sense, since cf-aws-tiny should look for etcd on backbone_z1 0 (10.10.3.11) and not on services_z1 0 (10.10.3.8) since etcd is not running there. I suspect that making a (minor) change in the templates for cf-aws-tiny will resolve this matter, will post back once I got that working successfully.

djvdorp commented 8 years ago

Turns out this was broken because I ran an old(er) version of cf-aws-tiny from this repo, that was holding the wrong reference.

It was already fixed in this commit which is in pr #79, so I will close this issue and just leave the useful information here for reference in case it's useful for somebody else for debugging similar problems later on.