Closed Sreeragsrg77 closed 9 months ago
@Sreeragsrg77, what version of PGO, postgres, and pgbackrest are you using? Can you provide PGO logs? Did you follow the directions in the doc below when setting up the standby cluster?
@dsessler7 my PGO Version is version=4.7.4 , psql (PostgreSQL) 13.5,pgBackRest 2.33
Below are the current PGO logs
Defaulted container "apiserver" out of: apiserver, operator, scheduler, event time="2023-12-14T06:05:35Z" level=info msg="debug flag set to false" time="2023-12-14T06:05:40Z" level=info msg="postgres-operator apiserver starts" func="main.main()" file="cmd/apiserver/main.go:111" version=4.7.4 time="2023-12-14T06:05:40Z" level=info msg="Pgo Namespace is [pgo]" func="internal/apiserver.Initialize()" file="internal/apiserver/root.go:100" version=4.7.4 time="2023-12-14T06:05:40Z" level=info msg="InstallationName is [devtest]" func="internal/apiserver.Initialize()" file="internal/apiserver/root.go:107" version=4.7.4 time="2023-12-14T06:05:40Z" level=info msg="apiserver starts" func="internal/apiserver.Initialize()" file="internal/apiserver/root.go:119" version=4.7.4 time="2023-12-14T06:05:40Z" level=info msg="loading PermMap with 56 Permissions\n" func="internal/apiserver.initializePerms()" file="internal/apiserver/perms.go:179" version=4.7.4 time="2023-12-14T06:05:40Z" level=info msg="Config: \"pgo-config\" ConfigMap found, using config files from the configmap" func="internal/config.initialize()" file="internal/config/pgoconfig.go:751" version=4.7.4 I1214 06:05:41.823141 1 request.go:668] Waited for 1.012642997s due to client-side throttling, not priority and fairness, request: GET:https://10.237.0.1:443/apis/monitoring.coreos.com/v1?timeout=32s time="2023-12-14T06:05:43Z" level=info msg="default instance memory set to [128Mi]" func="internal/config.(PgoConfig).Validate()" file="internal/config/pgoconfig.go:393" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="default pgbackrest repository memory set to [48Mi]" func="internal/config.(PgoConfig).Validate()" file="internal/config/pgoconfig.go:399" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="default pgbouncer memory set to [24Mi]" func="internal/config.(*PgoConfig).Validate()" file="internal/config/pgoconfig.go:405" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="BasicAuth is true" func="internal/apiserver.initConfig()" file="internal/apiserver/root.go:190" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="Namespace operating mode is 'dynamic'" func="internal/apiserver.Initialize()" file="internal/apiserver/root.go:151" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="pgo.tls Secret NOT found in namespace pgo" func="internal/apiserver.WriteTLSCert()" file="internal/apiserver/root.go:407" version=4.7.4 time="2023-12-14T06:05:43Z" level=info msg="listening on port 8443" func="main.main()" file="cmd/apiserver/main.go:182" version=4.7.4 2024/01/16 08:42:12 http: TLS handshake error from 127.0.0.1:37826: tls: failed to verify client certificate: x509: certificate has expired or is not yet valid: current time 2024-01-16T08:42:12Z is after 2023-12-10T08:13:22Z 2024/01/16 08:46:52 http: TLS handshake error from 127.0.0.1:37830: tls: failed to verify client certificate: x509: certificate has expired or is not yet valid: current time 2024-01-16T08:46:52Z is after 2023-12-10T08:13:22Z 2024/01/16 09:14:45 http: TLS handshake error from 127.0.0.1:37842: tls: failed to verify client certificate: x509: certificate has expired or is not yet valid: current time 2024-01-16T09:14:45Z is after 2023-12-10T08:13:22Z 2024/01/16 09:18:07 http: TLS handshake error from 127.0.0.1:37846: tls: failed to verify client certificate: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "serial:7978071590624879016")
I was able to resolve this by restarting my pgo pods and redeploy the standby cluster
ERROR: [078]: unable to remap invalid link 'pg_wal' Fri Jan 19 04:26:54 UTC 2024 ERROR: pgBackRest standby Creation: pgBackRest restore failed when creating standby 2024-01-19 04:26:54,853 ERROR: Error creating replica using method pgbackrest_standby: /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh standby exited with code=78 2024-01-19 04:26:54,854 ERROR: failed to bootstrap clone from remote master None 2024-01-19 04:26:54,855 INFO: Removing data directory: /pgdata/
2024-01-19 04:27:04,547 INFO: removing initialize key after failed attempt to bootstrap the cluster
Process Process-1:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/patroni/init.py", line 139, in patroni_main
abstract_main(Patroni, schema)
File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main
controller.run()
File "/usr/local/lib/python3.6/site-packages/patroni/init.py", line 109, in run
super(Patroni, self).run()
File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run
self._run_cycle()
File "/usr/local/lib/python3.6/site-packages/patroni/init.py", line 112, in _run_cycle
logger.info(self.ha.run_cycle())
File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1469, in run_cycle
info = self._run_cycle()
File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1343, in _run_cycle
return self.post_bootstrap()
File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1236, in post_bootstrap
self.cancel_initialization()
File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1229, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'