EngineerBetter / concourse-up

Deprecated - used Control Tower instead
https://github.com/EngineerBetter/control-tower
Apache License 2.0
203 stars 28 forks source link

Failing to launch the bosh agent #13

Closed novas0x2a closed 6 years ago

novas0x2a commented 6 years ago

I'm having a problem launching 0.4.6; I'm a bit of a bosh newbie, so I'm not quite sure where to begin. Here's the end of the concourse-up deploy output:

...
Started installing CPI
  Compiling package 'ruby_aws_cpi/dc02a5fa6999e95281b7234d4098640b0b90f1e6'... Finished (00:01:42)
  Compiling package 'bosh_aws_cpi/04ca340b3d64ea01aa84bd764cc574805785e97c'... Finished (00:00:01)
  Installing packages... Finished (00:00:00)
  Rendering job templates... Finished (00:00:00)
  Installing job 'aws_cpi'... Finished (00:00:00)
Finished installing CPI (00:01:44)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-aws-xen-hvm-ubuntu-trusty-go_agent/3445.11'... Finished (00:00:04)

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'ami-c03ec3b8 light'... Finished (00:00:40)
  Waiting for the agent on VM 'i-0512d97c6b57d36aa' to be ready... Failed (00:10:17)
Failed deploying (00:10:58)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Creating instance 'bosh/0':
    Waiting until instance is ready:
      Post https://mbus:<redacted>@<redacted>:6868/agent: dial tcp <redacted>:6868: i/o timeout

A re-run produces the same error. A curl -k -vv to the public URL times out equivalently; an SSH attempt lands on a host (i get the ssh banner, so the security group rules seem correct) but I can't figure out how to log in (I tried the private_key and the director_key from the s3 bucket, using the usernames root and admin; all four combinations are rejected). Not sure how to continue debugging, could you help? Thanks :D

peterellisjones commented 6 years ago

Hi Mike,

Usually when you get a timeout connecting to the bosh agent on the director it's due to network access being blocked on port 6868.

If you run curl -Ik https://<BOSH DIRECTOR PUBLIC IP>:6868/agent, you should receive a 401 unauthorised rather than a timeout if you have access.

Concourse-up will automatically whitelist your IP address for access on port 6868 to the director security group when it deploys. Is it possible your IP address changed during the deployment? You can check this by finding the security group called concourse-up-<DEPLOYMENT NAME>-director and checking it has access from your IP on port 6868 and 25555.

We run system tests with most combinations of flags but maybe there is something we missed — can you tell me if you are passing any flags to the deploy command?

For ssh AFAIK the user should be vcap and the default password c1oudc0w, using the private key in the private_key field of the config.json file in S3.

novas0x2a commented 6 years ago

Seems like it might be a blocked port on the connection I was using originally (it worked fine when I ran it from home, so shrug). Thanks!