EngineerBetter / concourse-up

Deprecated - used Control Tower instead
https://github.com/EngineerBetter/control-tower
Apache License 2.0
203 stars 29 forks source link

GCP deployment fails on credhub job #100

Closed glend closed 5 years ago

glend commented 5 years ago

Everything goes smooth until the credhub Bosh job, it fails with this log:

Task 10 | 17:12:36 | Updating instance web: web/01bb4191-125b-40fb-836f-5fe7a41e57f6 (0) (canary) (00:13:55)
                   L Error: 'web/01bb4191-125b-40fb-836f-5fe7a41e57f6 (0)' is not running after update. Review logs for failed jobs: credhub

I checked process logs and can't find anything suspicious, except:

web/01bb4191-125b-40fb-836f-5fe7a41e57f6:/var/vcap/sys/log/credhub$ tail -f credhub.stdout.log
[2019-02-28 17:16:25+0000] Could not reach the UAA server
[2019-02-28 17:17:05+0000] Could not reach the UAA server
[2019-02-28 17:17:45+0000] Could not reach the UAA server
[2019-02-28 17:18:26+0000] Could not reach the UAA server
[2019-02-28 17:19:05+0000] Could not reach the UAA server
[2019-02-28 17:19:45+0000] Could not reach the UAA server
[2019-02-28 17:20:25+0000] Could not reach the UAA server
[2019-02-28 17:21:05+0000] Could not reach the UAA server
[2019-02-28 17:21:45+0000] Could not reach the UAA server
[2019-02-28 17:22:26+0000] Could not reach the UAA server
[2019-02-28 17:23:05+0000] Could not reach the UAA server

Concourse is reachable via IP.

Also BOSH reports:

bosh instances                                                                                                                                                      
Using environment '34.76.xxx.xxx' as client 'admin'

Task 135. Done

Deployment 'concourse'

Instance                                     Process State  AZ  IPs
web/5c123261-0791-4c64-8a48-3e23926394ba     failing        z1  10.0.0.8 34.76.xxx.xxx
worker/839e11f9-0646-4e1d-a6fb-adbd34fe1de7  running        z1  10.0.1.7
worker/ca3a8b4e-6e27-48f6-b75b-376358b385eb  -              z1  -
worker/efdc6aa6-6762-4e67-9030-c4e970c7057c  running        z1  10.0.1.8

4 instances

Succeeded
bosh tasks                                                                                                                                                          
Using environment '34.76.xxx.xxx' as client 'admin'

ID   State       Started At                    Last Activity At              User  Deployment  Description   Result
122  processing  Thu Jan  1 00:00:00 UTC 1970  Fri Mar  1 12:32:22 UTC 2019  hm    concourse   scan and fix  -

1 tasks

Succeeded
evadinckel commented 5 years ago

Hello @glend , Thank you for reaching out to us and for raising the issue. I have successfully deployed a Concourse on GCP using version 0.20.2, so I am currently unable to reproduce this. Do you happen to have tried again since this ticket was open? Thanks, Eva

evadinckel commented 5 years ago

I believe my colleague @crsimmons has also mentioned a few debugging suggestions on slack last week, let us know how you get on with those! Best

crsimmons commented 5 years ago

This error generally arises when DNS isn't set up correctly. In Concourse-up both the UAA and Credhub are colocated on the same VM but credhub uses the DNS name of the ATC to contact the UAA (i.e. https://<yourdomain>:8443).

Based on a thread in Concourse-up slack it seems this issue was caused by some DNS configuration in GCP.

@glend Does correcting the DNS allow you to deploy successfully?

glend commented 5 years ago

@crsimmons Yes, after I corrected it the CredHub job succeeded.

Try adding a DNS zone for concourse.yourdomain.com (not yourdomain.com) and run concourse-up with said domain. It will add a wrong A record in that zone.

crsimmons commented 5 years ago

Great! Closing this issue now. 😄

krish118 commented 4 years ago

Hi @glend / @crsimmons , could you please let me know, where we need to update the DNS name. I ran into same issue.

Thnaks Krishna