NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
27 stars 12 forks source link

Do we need Metacat replication in 3.0.0? #1620

Closed taojing2002 closed 8 months ago

taojing2002 commented 1 year ago

Metacat replication was designed to replicate objects among different Metacat instances. Now we promote the DataONE replication mechanism to replicate objects among the member nodes. From this prospective, Metacat replication is obsoleted. There are also some other issues we need to think about:

However, CNs still use this Metacat replication mechanism to sync objects. We may use some other mechanisms to archive the backup feature:

taojing2002 commented 10 months ago

Here is the link to discuss the three Postgresql replication:

Based on the consideration of write performance, reading consistency, data loss, and distance proximity, it seems asynchronous replication is a good choice.

taojing2002 commented 10 months ago

Here are steps to set up the file sync from mn-sandbox-ucsb-1 to mn-sandbox-ucsb-1-clone: 1 Generate ssh key for the root user on mn-sandbox-ucsb-1.test.dataone.org:

ssh-keygen

(the file name is /root/.ssh/id_ecdsa) (no password)

2 Copy the key to mn-sandbox-ucsb-1-clone: Copy the key on mn-sandbox-ucsb-1

root@mn-sandbox-ucsb-1:vim /root/.ssh/id_ecdsa.pub 

Paste the key on the last line of this file:

tao@mn-sandbox-ucsb-1-clone:~$ vim /root/.ssh/authorized_keys

3 Create the rsync.sh file in the /var/metacat directory like this:

#!/bin/bash
rsync -aAXH --delete --stats --human-readable /var/metacat/documents/ mn-sandbox-ucsb-1-clone.test.dataone.org:/var/metacat/documents/

4 Change it to 774

chmod 774 /var/metacat/rsync.sh

5 Create a cron job every minute

crontab -e

Paste the line:

* * * * * /var/metacat/rsync.sh
taojing2002 commented 10 months ago

Here are steps to set up the Postgresql replication between mn-sandbox-ucsb-1 (primary, 128.111.85.184) and mn-sandbox-ucsb-1-clone (secondary, 128.111.85.191): Note: This is one way replication: primary -> secondary. And the secondary Postgresql server is read-only.

****Set up firewall on both servers****:

root@mn-sandbox-ucsb-1:/home/dev/tao# sudo ufw allow from 128.111.85.191 to any port 5432
root@mn-sandbox-ucsb-1-clone:/home/dev/tao# sudo ufw allow from 128.111.85.184 to any port 5432

****Set up the primary server mn-sandbox-ucsb-1****:

  1. Create a user with the replication privilege - repuser

    postgres=# CREATE USER repuser REPLICATION LOGIN CONNECTION LIMIT 1 PASSWORD 'password';
  2. Edit pg_hba.conf as user postgres:

    vim /etc/postgresql/14/main/pg_hba.conf
    #add the line:
    hostssl replication  repuser  128.111.85.191/32  scram-sha-256
  3. Edit postgresql.conf as user postgres:

    vim /etc/postgresql/14/main/postgresql.conf
    #modify or add the lines:
    listen_addresses = 'localhost,128.111.85.184'
    wal_level = hot_standby
    wal_keep_size = 64
    max_wal_senders = 10
    ssl = true
    ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/cert.pem'
    ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/privkey.pem'
  4. Make the postgres user can read the cert.pem and privkey.pem files.

    usermod -a -G ssl-cert postgres
    cd /etc/letsencrypt
    chown root:ssl-cert live
    chown root:ssl-cert archive
    chmod 750 live
    chmod 750 archive
    cd live
    chown root:ssl-cert mn-sandbox-ucsb-1.test.dataone.org
    cd ../archive
    chown root:ssl-cert -R  *
    cd mn-sandbox-ucsb-1.test.dataone.org
    chmod 640 privkey*
  5. Restart the primary postgresql:

    sudo /etc/init.d/postgresql restart

    ****Set up the secondary server mn-sandbox-ucsb-1-clone****:

  6. Stop postgresql:

    sudo /etc/init.d/postgresql stop 
  7. Edit pg_hba.conf as user postgres:

    vim /etc/postgresql/14/main/pg_hba.conf
    #add the line:
    hostssl replication  repuser  128.111.85.184/32  scram-sha-256
  8. Edit postgresql.conf as user postgres:

    vim /etc/postgresql/14/main/postgresql.conf
    #modify or add the lines:
    listen_addresses = 'localhost,128.111.85.191 '
    wal_level = hot_standby
    max_wal_senders = 10
    wal_keep_size = 64
    hot_standby = on
    ssl = true
    ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/cert.pem'
    ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/privkey.pem'
  9. Make the postgres user can read the cert.pem and privkey.pem files

    usermod -a -G ssl-cert postgres
    cd /etc/letsencrypt
    chown root:ssl-cert live
    chown root:ssl-cert archive
    chmod 750 live
    chmod 750 archive
    cd live
    chown root:ssl-cert mn-sandbox-ucsb-1-clone.test.dataone.org
    cd ../archive
    chown root:ssl-cert -R  *
    cd mn-sandbox-ucsb-1-clone.test.dataone.org
    chmod 640 privkey*
  10. Access the PostgreSQL data directory in the secondary server and remove everything:

    cd /var/lib/postgresql/14/main
    sudo rm -rfv *
  11. Copy PostgreSQL primary server data directory files to PostgreSQL secondary server data directory as user postgres:

    postgres@mn-sandbox-ucsb-1-clone    pg_basebackup -h 128.111.85.184 -D /var/lib/postgresql/14/main/ -P -U repuser --wal-method=fetch

    Note: it took 70 minutes to transfer 185G data

  12. Add the following command in postgresql.conf file as user postgres:

    vim /etc/postgresql/14/main/postgresql.conf
    primary_conninfo = 'host=128.111.85.184 port=5432 sslmode=require user=repuser password=password'
  13. In var/lib/postgresql/14/main/, create a empty file to signal it is a standby server as user postgres:

    postgres@mn-sandbox-ucsb-1:~/14/main$ cd /var/lib/postgresql/14/main
    postgres@mn-sandbox-ucsb-1:~/14/main$ touch standby.signal
  14. start postgres:

    root@mn-sandbox-ucsb-1:/var/lib/postgresql/14/main# sudo /etc/init.d/postgresql start

****Check the status of replication****

  1. Check the primary server:
    postgres=# select * from pg_stat_replication;

3 Check the secondary server:

postgres=# select * from pg_stat_wal_receiver ;
taojing2002 commented 10 months ago

After I setting up both the file and postgres replications in the two servers, it works well - when I uploaded an object to mn-sandbox-ucsb-1, I could read the system metadata and object from the secondary server mn-sandbox-ucsb-1-clone as well. Since I didn't set up Zookeeper for Solr replication, so Solr search doesn't work.

Some issues:

  1. Reading object/sysmeta against the secondary server works but the associated read events can't be saved into the event log database table since it is read-only (We decided we can skip the log events happened on the secondary CN in our dev meeting). But we still saw the error message regarding the failure of saving database in the Tomcat log file. So do we need a new feature which can disable the log of events in Metacat? If we have, Metacat wouldn't bother to try to save logs so the error messages can be eliminated.
taojing2002 commented 9 months ago

We just figured out how to replicate files and database between Metacats. Additionally we have the way using Zookeeper to replicate Solr between CNs. So the Metacat replication between CNs can be replaced by them, which means the old Metacat replication feature can be dropped. @mbjones @artntek @doulikecookiedough What do you think?

taojing2002 commented 9 months ago

In today's dev meeting, we decided that we could remove the old metacat replication.

taojing2002 commented 9 months ago

@mbjones @artntek @doulikecookiedough I am going to drop the xml_replication table and theserver_location column in the xml_documents and xml_revision tables, which is a foreign key to the xml_replication table.