Vonng / pigsty

Battery-Included PostgreSQL Distro as a Free RDS Alternative
https://pigsty.io
GNU Affero General Public License v3.0
3.01k stars 242 forks source link

Install fail at centos7,Timeout when waiting for 192.168.1.x:8008 #341

Closed oukanghua closed 8 months ago

oukanghua commented 8 months ago

1. The Installation Log

ASK [pgsql : copy pg server key] ******************************************************************************************************
changed: [192.168.1.12]
changed: [192.168.1.13]
changed: [192.168.1.11]

TASK [pgsql : grant postgres dbsu watchdog owner] **************************************************************************************
ok: [192.168.1.12]
ok: [192.168.1.13]
ok: [192.168.1.11]

TASK [pgsql : launch patroni primary] **************************************************************************************************
changed: [192.168.1.12]
changed: [192.168.1.13]
changed: [192.168.1.11]

TASK [pgsql : wait for patroni primary] ************************************************************************************************
fatal: [192.168.1.13]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for 192.168.1.13:8008"}
fatal: [192.168.1.12]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for 192.168.1.12:8008"}
fatal: [192.168.1.11]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for 192.168.1.11:8008"}

NO MORE HOSTS LEFT *********************************************************************************************************************

PLAY RECAP *****************************************************************************************************************************
192.168.1.11               : ok=243  changed=188  unreachable=0    failed=1    skipped=50   rescued=0    ignored=0   
192.168.1.12               : ok=116  changed=95   unreachable=0    failed=1    skipped=31   rescued=0    ignored=0   
192.168.1.13               : ok=116  changed=95   unreachable=0    failed=1    skipped=31   rescued=0    ignored=0   
localhost                  : ok=6    changed=3    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   

2. pigsty.yml


all:
  children:

    # infra cluster for proxy, monitor, alert, etc..
    infra: { hosts: { 192.168.1.11: { infra_seq: 1 } } }

    # etcd cluster for ha postgres
    etcd: { hosts: { 192.168.1.11: { etcd_seq: 1 } }, vars: { etcd_cluster: etcd } }

    # minio cluster, optional backup repo for pgbackrest
    #minio: { hosts: { 192.168.1.11: { minio_seq: 1 } }, vars: { minio_cluster: minio } }

    # postgres cluster 'pg-citus' with single primary instance
    pg-citus1: # citus coordinator, pg_group = 1
      hosts: { 192.168.1.11: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus1 , pg_group: 1 }
    pg-citus2: # citus coordinator, pg_group = 2
      hosts: { 192.168.1.12: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus2 , pg_group: 2 }
    pg-citus3: # citus coordinator, pg_group = 3
      hosts: { 192.168.1.13: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus3 , pg_group: 3 }

  vars:                               # global parameters
    pg_mode: citus                    # pgsql : citus
    pg_shard: pg-citus                # citus : pg-citus
    patroni_citus_db: meta        # citus:result
    pg_dbsu_password:  ... # dbsu 
    pg_users: [ { name: meta ,password: .. ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: meta, owner: meta, extensions: [ { name: citus },{ name: pg_cron },{ name: pg_partman }, { name: postgis  } ] }, { name: citus, owner: result , extensions: [ { name: citus },{ name: pg_cron },{ name: pg_partman }, { name: postgis }, { name: hll }, { name: topn }, { name: tdigest } ] }]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 192.168.1.10/24 ,auth: trust ,title: 'trust local db members' }  
      - { user: 'all' ,db: all  ,addr: 192.168.1.1/16 ,auth: ssl ,title: 'all user ssl access from localhost' } 
      - { user: 'all' ,db: all  ,addr: 172.16.1.1/16 ,auth: ssl ,title: 'all user ssl access from localhost' } 
      - { user: 'all' ,db: all  ,addr: 10.200.1.1/16 ,auth: ssl ,title: 'all user ssl access from localhost' } 
      - { user: 'all' ,db: all  ,addr: intra        ,auth: ssl ,title: 'all user ssl access from intranet'  }   
    pg_libs: 'citus, pg_cron, pg_partman_bgw, timescaledb, pg_stat_statements, auto_explain'
    pg_extensions:     
      - pg_repack_${pg_version} pg_qualstats_${pg_version} pg_stat_kcache_${pg_version} pg_stat_monitor_${pg_version} wal2json_${pg_version}
      - pg_cron_${pg_version} pg_partman_${pg_version} pg_jobmon_${pg_version} hll_${pg_version} topn_${pg_version} tdigest_${pg_version}
    node_crontab:                  
      - '00 06 * * * postgres /pg/bin/pg-backup 2>>/pg/log/backup.log'
    pg_conf: olap.yml
...

3. patroni status

 systemctl status patroni
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/usr/lib/systemd/system/patroni.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2023-11-09 11:09:11 CST; 3s ago
  Process: 55020 ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml (code=exited, status=1/FAILURE)
 Main PID: 55020 (code=exited, status=1/FAILURE)

Nov 09 11:09:11 pg-citus2-1 patroni[55020]: File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1692, in _run_cycle
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: return self.post_bootstrap()
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1576, in post_bootstrap
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: self.cancel_initialization()
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: File "/usr/lib/python3.6/site-packages/patroni/ha.py", line 1569, in cancel_initialization
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: raise PatroniFatalException('Failed to bootstrap cluster')
Nov 09 11:09:11 pg-citus2-1 patroni[55020]: patroni.exceptions.PatroniFatalException: Failed to bootstrap cluster
Nov 09 11:09:11 pg-citus2-1 systemd[1]: patroni.service: main process exited, code=exited, status=1/FAILURE
Nov 09 11:09:11 pg-citus2-1 systemd[1]: Unit patroni.service entered failed state.
Nov 09 11:09:11 pg-citus2-1 systemd[1]: patroni.service failed.

4. patroni log

2023-11-08 23:10:53 +0800 INFO: Selected new etcd server https://192.168.1.11:2379
2023-11-08 23:10:53 +0800 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-11-08 23:10:53 +0800 INFO: Lock owner: None; I am pg-citus3-1
2023-11-08 23:10:53 +0800 INFO: trying to bootstrap a new cluster
2023-11-08 23:10:59 +0800 INFO: postmaster pid=34229
2023-11-08 23:10:59 +0800 INFO: removing initialize key after failed attempt to bootstrap the cluster
2023-11-08 23:10:59 +0800 INFO: renaming data directory to /pg/data.failed
Vonng commented 8 months ago

Not sure what the exact problem is, but I guess something to do with pg_cron.

There's a required postgres parameters cron.database_name for postgres. check that.

Maybe you can remove that extensions and shared libraires, and install it after cluster is bootstrapped.

oukanghua commented 8 months ago

Not sure what the exact problem is, but I guess something to do with pg_cron.

There's a required postgres parameters cron.database_name for postgres. check that.

Maybe you can remove that extensions and shared libraires, and install it after cluster is bootstrapped.

Thank you for your answer. I found the parameter of ‘pg_extensions’ is mismatched.Final configuration is

  pg_extensions:                  
      - pg_repack_${pg_version}* wal2json_${pg_version}* passwordcheck_cracklib_${pg_version}*
      - postgis3*_${pg_version}* timescaledb-2-postgresql-${pg_version}* pgvector_${pg_version}* citus_${pg_version}*
      - pg_jobmon_${pg_version}* hll_${pg_version}* topn_${pg_version}* tdigest_${pg_version}*