Closed taojing2002 closed 8 months ago
Here is the link to discuss the three Postgresql replication:
Based on the consideration of write performance, reading consistency, data loss, and distance proximity, it seems asynchronous replication is a good choice.
Here are steps to set up the file sync from mn-sandbox-ucsb-1
to mn-sandbox-ucsb-1-clone
:
1 Generate ssh key for the root user on mn-sandbox-ucsb-1.test.dataone.org
:
ssh-keygen
(the file name is /root/.ssh/id_ecdsa) (no password)
2 Copy the key to mn-sandbox-ucsb-1-clone
:
Copy the key on mn-sandbox-ucsb-1
root@mn-sandbox-ucsb-1:vim /root/.ssh/id_ecdsa.pub
Paste the key on the last line of this file:
tao@mn-sandbox-ucsb-1-clone:~$ vim /root/.ssh/authorized_keys
3 Create the rsync.sh
file in the /var/metacat
directory like this:
#!/bin/bash
rsync -aAXH --delete --stats --human-readable /var/metacat/documents/ mn-sandbox-ucsb-1-clone.test.dataone.org:/var/metacat/documents/
4 Change it to 774
chmod 774 /var/metacat/rsync.sh
5 Create a cron job every minute
crontab -e
Paste the line:
* * * * * /var/metacat/rsync.sh
Here are steps to set up the Postgresql replication between mn-sandbox-ucsb-1
(primary
, 128.111.85.184
) and mn-sandbox-ucsb-1-clone
(secondary
, 128.111.85.191
):
Note: This is one way replication: primary -> secondary
. And the secondary Postgresql server is read-only.
****Set up firewall on both servers****:
root@mn-sandbox-ucsb-1:/home/dev/tao# sudo ufw allow from 128.111.85.191 to any port 5432
root@mn-sandbox-ucsb-1-clone:/home/dev/tao# sudo ufw allow from 128.111.85.184 to any port 5432
****Set up the primary server mn-sandbox-ucsb-1
****:
Create a user with the replication privilege - repuser
postgres=# CREATE USER repuser REPLICATION LOGIN CONNECTION LIMIT 1 PASSWORD 'password';
Edit pg_hba.conf as user postgres
:
vim /etc/postgresql/14/main/pg_hba.conf
#add the line:
hostssl replication repuser 128.111.85.191/32 scram-sha-256
Edit postgresql.conf as user postgres
:
vim /etc/postgresql/14/main/postgresql.conf
#modify or add the lines:
listen_addresses = 'localhost,128.111.85.184'
wal_level = hot_standby
wal_keep_size = 64
max_wal_senders = 10
ssl = true
ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/cert.pem'
ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/privkey.pem'
Make the postgres
user can read the cert.pem
and privkey.pem
files.
usermod -a -G ssl-cert postgres
cd /etc/letsencrypt
chown root:ssl-cert live
chown root:ssl-cert archive
chmod 750 live
chmod 750 archive
cd live
chown root:ssl-cert mn-sandbox-ucsb-1.test.dataone.org
cd ../archive
chown root:ssl-cert -R *
cd mn-sandbox-ucsb-1.test.dataone.org
chmod 640 privkey*
Restart the primary postgresql:
sudo /etc/init.d/postgresql restart
****Set up the secondary server mn-sandbox-ucsb-1-clone
****:
Stop postgresql:
sudo /etc/init.d/postgresql stop
Edit pg_hba.conf as user postgres
:
vim /etc/postgresql/14/main/pg_hba.conf
#add the line:
hostssl replication repuser 128.111.85.184/32 scram-sha-256
Edit postgresql.conf as user postgres
:
vim /etc/postgresql/14/main/postgresql.conf
#modify or add the lines:
listen_addresses = 'localhost,128.111.85.191 '
wal_level = hot_standby
max_wal_senders = 10
wal_keep_size = 64
hot_standby = on
ssl = true
ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/cert.pem'
ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/privkey.pem'
Make the postgres
user can read the cert.pem
and privkey.pem
files
usermod -a -G ssl-cert postgres
cd /etc/letsencrypt
chown root:ssl-cert live
chown root:ssl-cert archive
chmod 750 live
chmod 750 archive
cd live
chown root:ssl-cert mn-sandbox-ucsb-1-clone.test.dataone.org
cd ../archive
chown root:ssl-cert -R *
cd mn-sandbox-ucsb-1-clone.test.dataone.org
chmod 640 privkey*
Access the PostgreSQL data directory in the secondary server and remove everything:
cd /var/lib/postgresql/14/main
sudo rm -rfv *
Copy PostgreSQL primary server data directory files to PostgreSQL secondary server data directory as user postgres
:
postgres@mn-sandbox-ucsb-1-clone pg_basebackup -h 128.111.85.184 -D /var/lib/postgresql/14/main/ -P -U repuser --wal-method=fetch
Note: it took 70 minutes to transfer 185G data
Add the following command in postgresql.conf file as user postgres
:
vim /etc/postgresql/14/main/postgresql.conf
primary_conninfo = 'host=128.111.85.184 port=5432 sslmode=require user=repuser password=password'
In var/lib/postgresql/14/main/, create a empty file to signal it is a standby server as user postgres
:
postgres@mn-sandbox-ucsb-1:~/14/main$ cd /var/lib/postgresql/14/main
postgres@mn-sandbox-ucsb-1:~/14/main$ touch standby.signal
start postgres:
root@mn-sandbox-ucsb-1:/var/lib/postgresql/14/main# sudo /etc/init.d/postgresql start
****Check the status of replication****
postgres=# select * from pg_stat_replication;
3 Check the secondary server:
postgres=# select * from pg_stat_wal_receiver ;
After I setting up both the file and postgres replications in the two servers, it works well - when I uploaded an object to mn-sandbox-ucsb-1
, I could read the system metadata and object from the secondary server mn-sandbox-ucsb-1-clone
as well. Since I didn't set up Zookeeper for Solr replication, so Solr search doesn't work.
Some issues:
We just figured out how to replicate files and database between Metacats. Additionally we have the way using Zookeeper to replicate Solr between CNs. So the Metacat replication between CNs can be replaced by them, which means the old Metacat replication feature can be dropped. @mbjones @artntek @doulikecookiedough What do you think?
In today's dev meeting, we decided that we could remove the old metacat replication.
@mbjones @artntek @doulikecookiedough I am going to drop the xml_replication
table and theserver_location
column in the xml_documents
and xml_revision
tables, which is a foreign key to the xml_replication
table.
Metacat replication was designed to replicate objects among different Metacat instances. Now we promote the DataONE replication mechanism to replicate objects among the member nodes. From this prospective, Metacat replication is obsoleted. There are also some other issues we need to think about:
However, CNs still use this Metacat replication mechanism to sync objects. We may use some other mechanisms to archive the backup feature: