Closed Zmccll closed 1 month ago
Pigsty version: 2.6.0 Postgres version:
dbuser_dba@pg-meta-1:5432/postgres=# SELECT version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.3 (Ubuntu 16.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
The entire recovery process
postgres@pg-meta-1:~$ pg pause
Success: cluster management is paused
postgres@pg-meta-1:~$ pt-stop
postgres@pg-meta-1:~$ pg-stop
waiting for server to shut down.... done
server stopped
postgres@pg-meta-1:~$ pgbackrest --stanza=pg-meta --type=time --target='2024-05-17 07:05:46+00' restore
2024-05-17 07:10:02.776 P00 INFO: restore command begin 2.51: --archive-mode=off --delta --exec-id=69015-4650a479 --link-all --log-level-console=info --log-level-file=detail --log-path=/pg/log/pgbackrest --pg1-path=/pg/data --process-max=4 --repo1-path=/pg/backup --spool-path=/pg/tmp --stanza=pg-meta --target="2024-05-17 07:05:46+00" --type=time
2024-05-17 07:10:02.783 P00 INFO: repo1: restore backup set 20240517-070332F, recovery will start at 2024-05-17 07:03:32
2024-05-17 07:10:02.785 P00 INFO: remove invalid files/links/paths from '/pg/data'
2024-05-17 07:10:03.437 P00 INFO: write updated /pg/data/postgresql.auto.conf
2024-05-17 07:10:03.451 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2024-05-17 07:10:03.452 P00 INFO: restore size = 25MB, file total = 981
2024-05-17 07:10:03.453 P00 INFO: restore command end: completed successfully (680ms)
postgres@pg-meta-1:~$ pg-start
waiting for server to start....2024-05-17 07:10:06.776 UTC [69023] LOG: redirecting log output to logging collector process
2024-05-17 07:10:06.776 UTC [69023] HINT: Future log output will appear in directory "/pg/log/postgres".
done
server started
postgres@pg-meta-1:~$ pg-promote
waiting for server to promote.... done
server promoted
postgres@pg-meta-1:~$ pt-start
postgres@pg-meta-1:~$ psql -c 'ALTER SYSTEM SET archive_mode = on;'
ALTER SYSTEM
Time: 2.251 ms
postgres@pg-meta-1:~$ psql -c 'SHOW archive_mode;'
archive_mode
--------------
off
(1 row)
Time: 0.107 ms
postgres@pg-meta-1:~$ pg-restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2024-05-17 07:10:36.326 UTC [69197] LOG: redirecting log output to logging collector process
2024-05-17 07:10:36.326 UTC [69197] HINT: Future log output will appear in directory "/pg/log/postgres".
done
server started
postgres@pg-meta-1:~$ psql -c 'SHOW archive_mode;'
archive_mode
--------------
on
(1 row)
Time: 0.253 ms
postgres@pg-meta-1:~$ pt-restart
postgres@pg-meta-1:~$ pg reinit pg-meta
+ Cluster: pg-meta (7369858807812509474) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-1 | 10.60.10.10 | Leader | running | 2 | | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-2 | 10.60.10.9 | Replica | running | 1 | 0 | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-3 | 10.60.10.8 | Replica | running | 1 | 0 | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Maintenance mode: on
Which member do you want to reinitialize [pg-meta-3, pg-meta-2]? []: pg-meta-3
Are you sure you want to reinitialize members pg-meta-3? [y/N]: y
Success: reinitialize for member pg-meta-3
postgres@pg-meta-1:~$ pg list
+ Cluster: pg-meta (7369858807812509474) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-1 | 10.60.10.10 | Leader | running | 2 | | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-2 | 10.60.10.9 | Replica | running | 1 | 0 | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-meta-3 | 10.60.10.8 | Replica | running | 1 | 0 | clonefrom: true |
| | | | | | | conf: oltp.yml |
| | | | | | | spec: 4C.8G.48G |
| | | | | | | version: '16' |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Maintenance mode: on
postgres@pg-meta-1:~$
You can choose one of the following ways to reconstruct other replicas:
/pg/data/*
and restart patroni on replicas one by onepgbackrest
command and restore on replica (fastest, but require central backup repo)
Postgres pb info
the commands guided by pg-pitr
After performing the recovery as guided, the status of the database cluster is as follows:
The TL version is inconsistent. Deleting the slave node and re-adding it did not resolve the issue. THE Error logs from pg-meta-2:
2024-05-17 06:29:50.777 UTC,"replicator","",48472,"10.60.10.9:33014",6646f95e.bd58,3,"TIMELINE_HISTORY",2024-05-17 06:29:50 UTC,7/0,0,ERROR,58P01,"could not open file ""pg_wal/00000002.history"": No such file or directory",,,,,,"TIMELINE_HISTORY 2",,,"pg-meta-2","walsender",,0