concourse / concourse-bosh-deployment

A toolchain for deploying Concourse with BOSH.
Apache License 2.0
86 stars 155 forks source link

Atc is not running because of issue of access to postgresql #108

Closed itot555 closed 6 years ago

itot555 commented 6 years ago

Atc process in web server is not running after deploying concourse because of issue of access to postgresql. Is there any wrong configuration in my environment ?

Bosh director info

$ bosh -e bosh_gcp env
Using environment '192.168.101.8' as client 'admin'

Name      bosh_gcp
UUID      40d45801-2ec3-4abf-a0ae-2d5dfd691f3a
Version   268.0.1 (00000000)
CPI       google_cpi
Features  compiled_package_cache: disabled
          config_server: disabled
          dns: disabled
          snapshots: disabled
User      admin

Succeeded

Concourse manifest

---
name: concourse

director_uuid: 0b298745-6427-43b7-bae2-f9d40ef45027

releases:
- name: concourse
  version: ((concourse_version))
  sha1: ((concourse_sha1))
  url: https://bosh.io/d/github.com/concourse/concourse?v=((concourse_version))
- name: garden-runc
  version: ((garden_runc_version))
  sha1: ((garden_runc_sha1))
  url: https://bosh.io/d/github.com/cloudfoundry/garden-runc-release?v=((garden_runc_version))
- name: postgres
  version: ((postgres_version))
  sha1: ((postgres_sha1))
  url: https://bosh.io/d/github.com/cloudfoundry/postgres-release?v=((postgres_version))

instance_groups:
- name: web
  instances: 1
  azs: [z1]
  networks:
  - name: public
    default: [dns, gateway]
  - name: web
    static_ips: [xxx.xxx.xxx.xxx]
  stemcell: xenial
  vm_type: default
  jobs:
  - release: concourse
    name: atc
    properties:
      log_level: debug
      token_signing_key: ((token_signing_key))
      external_url: http://xxx.xxx.xxx.xxx:8080
      postgresql:
        database: &db_name atc
        role: &db_role
          name: concourse
          password: ((postgres_password))

  - release: concourse
    name: tsa
    properties:
      log_level: debug
      host_key: ((tsa_host_key))
      token_signing_key: ((token_signing_key))
      authorized_keys: [((worker_key.public_key))]

- name: db
  instances: 1
  azs: [z1]
  networks: [{name: private}]
  stemcell: xenial
  vm_type: default
  persistent_disk_type: db
  jobs:
  - release: postgres
    name: postgres
    properties:
      databases:
        port: 5432
        databases:
        - name: *db_name
        roles:
        - *db_role

- name: worker
  instances: 1
  azs: [z1]
  networks: [{name: private}]
  stemcell: xenial
  vm_type: default
  jobs:
  - release: concourse
    name: worker
    consumes: {baggageclaim: {from: worker-baggageclaim}}
    properties:
      drain_timeout: 10m
      tsa: {worker_key: ((worker_key))}

  - release: concourse
    name: baggageclaim
    properties: {log_level: debug}
    provides: {baggageclaim: {as: worker-baggageclaim}}

  - release: garden-runc
    name: garden
    properties:
      garden:
        listen_network: tcp
        listen_address: 0.0.0.0:7777

variables:
- name: postgres_password
  type: password
- name: token_signing_key
  type: rsa
- name: tsa_host_key
  type: ssh
- name: worker_key
  type: ssh

stemcells:
- alias: xenial
  os: ubuntu-xenial
  version: latest

update:
  canaries: 1
  max_in_flight: 3
  serial: false
  canary_watch_time: 1000-60000
  update_watch_time: 1000-60000

Concourse deploy

~/bosh_gcp/concourse-bosh-deployment/cluster$ bosh -e bosh_gcp deploy -d concourse concourse.yml ¥
-l ../versions.yml ¥
-l ../../concourse-key/key-creds.yml ¥
--var-file gcp_credentials_json=../../gcp.json ¥
--vars-store ../../concourse-cluster-creds.yml

Web's process status is failing

$ bosh -e bosh_gcp ds
Using environment '192.168.101.8' as client 'admin'

Name       Release(s)          Stemcell(s)                                   Team(s)
concourse  concourse/4.2.1     bosh-google-kvm-ubuntu-xenial-go_agent/97.22  -
           garden-runc/1.16.3
           postgres/30

1 deployments

Succeeded
$ bosh -e bosh_gcp -d concourse instances
Using environment '192.168.101.8' as client 'admin'

Task 90. Done

Deployment 'concourse'

Instance                                     Process State  AZ  IPs
db/1d237a0d-5969-4983-9719-3638a2eb1cc7      running        z1  192.168.20.3
web/bbc5bf9f-6321-4133-a81c-0e0a0c64ac90     failing        z1  192.168.20.2
                                                                xxx.xxx.xxx.xxx
worker/aee2cfb4-dbca-4d71-a454-89d58dc75139  running        z1  192.168.20.4

3 instances

Succeeded

Check web server

$ bosh -e bosh_gcp -d concourse ssh web/bbc5bf9f-6321-4133-a81c-0e0a0c64ac90
$ sudo su -
# monit status
The Monit daemon 5.2.5 uptime: 3m

Process 'atc'
  status                            not monitored
  monitoring status                 not monitored
  data collected                    Tue Oct  9 15:44:13 2018

Process 'tsa'
  status                            running
  monitoring status                 monitored
  pid                               6102
  parent pid                        1
  uptime                            3m
  children                          0
  memory kilobytes                  13412
  memory kilobytes total            13412
  memory percent                    0.1%
  memory percent total              0.1%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Tue Oct  9 15:44:13 2018

System 'system_vm-52ae735f-6d95-4155-75ad-b7f8004f2a73.c.tito-emc-work.internal'
  status                            running
  monitoring status                 monitored
  load average                      [0.04] [0.04] [0.00]
  cpu                               0.4%us 0.4%sy 0.0%wa
  memory usage                      170336 kB [2.2%]
  swap usage                        0 kB [0.0%]
  data collected                    Tue Oct  9 15:44:13 2018

I couldn't start atc process manually.

# monit start atc
# monit status
The Monit daemon 5.2.5 uptime: 4m

Process 'atc'
  status                            not monitored - start pending
  monitoring status                 not monitored
  data collected                    Tue Oct  9 15:44:53 2018

Process 'tsa'
  status                            running
  monitoring status                 monitored
  pid                               6102
  parent pid                        1
  uptime                            3m
  children                          0
  memory kilobytes                  13412
  memory kilobytes total            13412
  memory percent                    0.1%
  memory percent total              0.1%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Tue Oct  9 15:44:53 2018

System 'system_vm-52ae735f-6d95-4155-75ad-b7f8004f2a73.c.tito-emc-work.internal'
  status                            running
  monitoring status                 monitored
  load average                      [0.02] [0.03] [0.00]
  cpu                               0.4%us 0.4%sy 0.0%wa
  memory usage                      171120 kB [2.2%]
  swap usage                        0 kB [0.0%]
  data collected                    Tue Oct  9 15:44:53 2018
# monit status
The Monit daemon 5.2.5 uptime: 4m

Process 'atc'
  status                            Execution failed - start pending
  monitoring status                 monitored
  data collected                    Tue Oct  9 15:45:33 2018

Process 'tsa'
  status                            running
  monitoring status                 monitored
  pid                               6102
  parent pid                        1
  uptime                            4m
  children                          0
  memory kilobytes                  13412
  memory kilobytes total            13412
  memory percent                    0.1%
  memory percent total              0.1%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Tue Oct  9 15:45:33 2018

System 'system_vm-52ae735f-6d95-4155-75ad-b7f8004f2a73.c.tito-emc-work.internal'
  status                            running
  monitoring status                 monitored
  load average                      [0.02] [0.03] [0.00]
  cpu                               100.0%us 0.0%sy 0.0%wa
  memory usage                      171240 kB [2.2%]
  swap usage                        0 kB [0.0%]
  data collected                    Tue Oct  9 15:45:33 2018

I check atc's log, and it seems that atc can't access to postgresql correctly

# cat /var/vcap/sys/log/atc/atc.stderr.log
default team auth not configured: No auth methods have been configured.
default team auth not configured: No auth methods have been configured.
default team auth not configured: No auth methods have been configured.
default team auth not configured: No auth methods have been configured.
default team auth not configured: No auth methods have been configured.
default team auth not configured: No auth methods have been configured.
# cat /var/vcap/sys/log/atc/atc.stdout.log
{"timestamp":"1539099672.294151783","source":"atc","message":"atc.db.failed-to-open-db-retrying","log_level":2,"data":{"error":"dial tcp 192.168.20.3:5432: connect: connection refused","session":"3"}}
{"timestamp":"1539099677.297416925","source":"atc","message":"atc.db.failed-to-open-db-retrying","log_level":2,"data":{"error":"dial tcp 192.168.20.3:5432: connect: connection refused","session":"3"}}
{"timestamp":"1539099682.303802967","source":"atc","message":"atc.db.failed-to-open-db-retrying","log_level":2,"data":{"error":"dial tcp 192.168.20.3:5432: connect: connection refused","session":"3"}}
{"timestamp":"1539099687.307279348","source":"atc","message":"atc.db.failed-to-open-db-retrying","log_level":2,"data":{"error":"dial tcp 192.168.20.3:5432: connect: connection refused","session":"3"}}
{"timestamp":"1539099692.310779572","source":"atc","message":"atc.db.failed-to-open-db-retrying","log_level":2,"data":{"error":"dial tcp 192.168.20.3:5432: connect: connection refused","session":"3"}}
itot555 commented 6 years ago

I have modified concourse.yaml and I could deploy it correctly.