gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 15 forks source link

Make alt efs sync with live #2025

Open kencho51 opened 2 months ago

kencho51 commented 2 months ago

Pull request for issue: #2000

This is a pull request for the following functionalities:

How to test?

Describe how the new functionalities can be tested by PR reviewers

In local dev

% cd /gigadb/app/tools/sync-dropbox
# execute the configure script to generate .env
% ./configure 
An .env file wasn't present, creating a new one from the default example
Sourcing .env
./.env: line 16: syntax error near unexpected token `newline'
# then fill in the gitlab private token and the gitlab repo name in the .env and execute the configure script again to generate the rclone.conf file and get the ssh private key for accessing upstream server 
% ./configure  
An .env file is present
Sourcing .env
Current environment: dev
# then check the existence of the privet key 
% ls -al ~/.ssh/id-rsa-aws-hk-gigadb.pem
-rw-r--r--@ 1 kencho  staff  1675 Sep 19 16:12 /Users/kencho/.ssh/id-rsa-aws-hk-gigadb.pem
# then execute the bats tests
 % bats tests/bats/sync_dropbox.bats 
sync_dropbox.bats
 ✓ No parameter provided
 ✓ Execute in dry run mode
 ✓ Execute in apply mode

3 tests, 0 failures
# the last test will take several minutes to complete, as it tries to copy the files from the current existing upstream staging efs to yours local dev environment

Pre-requisites

  1. Follow the docs/SETUP_PROVISIONING.md to spin up servers

In staging as a centos user

% ssh -i path/to/staging/pem centos@$staging-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Mon Sep 30 04:36:53 2024 from 3.36.204.163
[centos@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 0
drwxrwxr-x. 2 centos centos  6 Sep 30 03:54 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 4
drwxr-xr-x.  2 root gigadb   30 Sep 30 04:23 .
drwxr-xr-x. 10 root root   4096 Sep 30 04:16 ..
-rw-rw-r--.  1 root gigadb    0 Sep 30 04:23 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ /usr/local/bin/sync_dropbox --apply
2024/09/30 05:50:24 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:51:10 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:51:10 INFO  : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos   82 Sep 30 05:50 .
drwxr-xr-x. 4 centos centos   35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos  166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos   36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 centos centos  167 Sep 30 05:50 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[centos@ip-10-99-0-22 ~]$ 
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 12
drwxr-xr-x.  2 root gigadb   30 Sep 30 04:23 .
drwxr-xr-x. 10 root root   4096 Sep 30 04:16 ..
-rw-rw-r--.  1 root gigadb 7346 Sep 30 05:51 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ cat /var/log/gigadb/sync_dropbox.log 
...
2024/09/30 05:50:35 INFO  : user27/Genome/Sample_information.csv: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/busco_full_table.csv: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/Venny_FigS3.zip: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/Binodoxys_communis_contig_level.fa: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/busco_short_summary.txt: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/SpeciesTreeAlignment.fa: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/missing_busco_list.csv: Copied (new)
2024/09/30 05:50:35 INFO  : user27/Genome/repeat.statistics.csv: Copied (new)
2024/09/30 05:50:36 INFO  : user27/Genome/repeatmasker.gff: Copied (new)
2024/09/30 05:50:36 INFO  : user27/Genome/trf.gff: Copied (new)
2024/09/30 05:50:36 INFO  : user4/brassicaceae_NCBI/Trees.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO  : user27/Genome/single_copy.pep.phy: Copied (new)
2024/09/30 05:50:37 INFO  : user4/brassicaceae_NCBI/its.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO  : user4/brassicaceae_NCBI/amas.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO  : user4/brassicaceae_NCBI/its_id_sp.csv: Copied (new)
2024/09/30 05:50:37 INFO  : user4/brassicaceae_NCBI/matk_id_spp.csv: Copied (new)
2024/09/30 05:50:39 INFO  : user27/Genome/single_copy.cds.phy: Copied (new)
2024/09/30 05:50:39 INFO  : user4/brassicaceae_NCBI/rbcl_id_spp.csv: Copied (new)
2024/09/30 05:50:41 INFO  : user4/brassicaceae_NCBI/trn.tar.gz: Copied (new)
2024/09/30 05:50:41 INFO  : user4/brassicaceae_NCBI/trn_id_spp.csv: Copied (new)
2024/09/30 05:50:41 INFO  : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:50:42 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:50:43 INFO  : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:50:43 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:50:43 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:50:44 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_N.png: Copied (new)
2024/09/30 05:50:44 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_P.png: Copied (new)
2024/09/30 05:50:45 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_N.png: Copied (new)
2024/09/30 05:50:45 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_P.png: Copied (new)
2024/09/30 05:50:46 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_N.png: Copied (new)
2024/09/30 05:50:46 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_P.png: Copied (new)
2024/09/30 05:50:47 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_N.png: Copied (new)
2024/09/30 05:50:47 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_P.png: Copied (new)
2024/09/30 05:50:47 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:50:48 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:50:49 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:50:49 INFO  : user0/change.log: Copied (new)
2024/09/30 05:50:49 INFO  : user0/some/directory/foobar.ext: Copied (new)
2024/09/30 05:50:54 INFO  : user4/brassicaceae_NCBI/rbcl.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:07 INFO  : user4/brassicaceae_NCBI/outputs.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:09 INFO  : user4/brassicaceae_NCBI/matk.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:10 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:51:10 INFO  : Successfully sync dropbox from production-staging to alt staging

In staging as a lily user

% cd ops/infrastructure/envs/staging
% ansible-playbook -i ../../inventories users_playbook.yml -e "newuser=lily" -e "credentials_csv_path=~/path/to/credentials.csv" -e "gigadb_env=staging" 
% chmod 500 output/privkeys-$bastion-ip/lily
% ssh -i output/privkeys-3.36.204.163/lily lily@$bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket

[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos   82 Sep 30 05:50 .
drwxr-xr-x. 4 centos centos   35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos  166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos   36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 centos centos  167 Sep 30 05:50 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ rm -r /share/dropbox/user27
[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 5 centos centos   68 Sep 30 05:58 .
drwxr-xr-x. 4 centos centos   35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos  166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos   36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 12
drwxr-xr-x.  2 root gigadb   30 Sep 30 04:23 .
drwxr-xr-x. 10 root root   4096 Sep 30 04:16 ..
-rw-rw-r--.  1 root gigadb 7346 Sep 30 05:51 sync_dropbox.log
[lily@ip-10-99-0-22 ~]$ /usr/local/bin/sync_dropbox --apply
2024/09/30 05:59:16 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:59:28 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO  : Successfully sync dropbox from production-staging to alt staging
[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos   82 Sep 30 05:59 .
drwxr-xr-x. 4 centos centos   35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos  166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos   36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 lily   lily    167 Sep 30 05:59 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x.  2 root gigadb    30 Sep 30 04:23 .
drwxr-xr-x. 10 root root    4096 Sep 30 04:16 ..
-rw-rw-r--.  1 root gigadb 12410 Sep 30 05:59 sync_dropbox.log
...
2024/09/30 05:59:16 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:59:19 INFO  : user27/102224.filesizes: Copied (new)
2024/09/30 05:59:19 INFO  : user27/.dotfiles.txt: Copied (new)
2024/09/30 05:59:19 INFO  : user27/102224.filesizes.bk: Copied (new)
2024/09/30 05:59:19 INFO  : user27/102224.md5.bk: Copied (new)
2024/09/30 05:59:19 INFO  : user27/102224.md5: Copied (new)
2024/09/30 05:59:19 INFO  : user27/readme.txt: Copied (new)
2024/09/30 05:59:20 INFO  : user27/Genome/Binodoxys_communis_chr_pep.fa: Copied (new)
2024/09/30 05:59:20 INFO  : user27/Genome/Binodoxys_communis_chr.gff: Copied (new)
2024/09/30 05:59:20 INFO  : user27/Genome/Expasion_KEGG_Fig2BC.csv: Copied (new)
2024/09/30 05:59:20 INFO  : user27/Genome/Binodoxys_communis_chr_cds.fa: Copied (new)
2024/09/30 05:59:20 INFO  : user27/Genome/Extraction_KEGG_Fig2DE.csv: Copied (new)
2024/09/30 05:59:21 INFO  : user27/Genome/Genome.md5: Copied (new)
2024/09/30 05:59:21 INFO  : user27/Genome/Gene_annotation.csv: Copied (new)
2024/09/30 05:59:21 INFO  : user27/Genome/Phylogenetic_tree_Fig2A.newick: Copied (new)
2024/09/30 05:59:21 INFO  : user27/Genome/Sample_information.csv: Copied (new)
2024/09/30 05:59:22 INFO  : user27/Genome/Venny_FigS3.zip: Copied (new)
2024/09/30 05:59:22 INFO  : user27/Genome/SpeciesTreeAlignment.fa: Copied (new)
2024/09/30 05:59:22 INFO  : user27/Genome/busco_full_table.csv: Copied (new)
2024/09/30 05:59:23 INFO  : user27/Genome/busco_short_summary.txt: Copied (new)
2024/09/30 05:59:23 INFO  : user27/Genome/missing_busco_list.csv: Copied (new)
2024/09/30 05:59:23 INFO  : user27/Genome/repeat.statistics.csv: Copied (new)
2024/09/30 05:59:23 INFO  : user27/Genome/repeatmasker.gff: Copied (new)
2024/09/30 05:59:24 INFO  : user27/Genome/Binodoxys_communis_chrom_level.fa: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Genome/single_copy.pep.phy: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Metabolite/Fig4BCDEF_Lipid_changes_in_adult_stage.csv: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Genome/Binodoxys_communis_contig_level.fa: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Genome/trf.gff: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Metabolite/Fig6BC_Lipid_changes_in_the_larval_stage.csv: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Metabolite/Fig7ABC_Biological data.csv: Copied (new)
2024/09/30 05:59:25 INFO  : user27/Genome/single_copy.cds.phy: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Fig7DEF_TAG_DAG_and_FA_of_cotton_aphids_after_parasitization.csv: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Fig9BD_Flaw rates of isotope-labeled metabolites.csv: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/FigS5_Quantitative_validation_of_metabolites.csv: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Metabolite.md5: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_N.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_P.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_N.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_P.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_N.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_N.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_P.png: Copied (new)
2024/09/30 05:59:26 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_P.png: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO  : Successfully sync dropbox from production-staging to alt staging
[lily@ip-10-99-0-22 ~]$ 

In live as a centos user

% ssh -i path/to/live/pem centos@$live-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Thu Oct 24 05:29:34 2024 from 54.180.33.208
[centos@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --dry-run
2024/10/24 05:29:54 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:30:45 INFO  : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO  : Successfully sync dropbox from production-live to alt live
[centos@ip-10-99-0-185 ~]$ 
[centos@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --apply
2024/10/24 05:29:54 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:30:45 INFO  : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO  : Successfully sync dropbox from production-live to alt live

[centos@ip-10-99-0-185 ~]$  ls -al /var/log/gigadb/sync_dropbox.log 
total 97796
drwxrwxrwx.  2 centos centos       30 Oct 23 15:11 .
drwxr-xr-x. 11 root   root       4096 Oct 23 15:12 ..
-rw-rw-r--.  1 root   gigadb 70335364 Oct 24 05:30 sync_dropbox.log
[centos@ip-10-99-0-185 ~]$ wc -l /var/log/gigadb/sync_dropbox.log 
374089 /var/log/gigadb/sync_dropbox.log
[centos@ip-10-99-0-185 ~]$ less var/log/gigadb/sync_dropbox.log 
2024/10/24 05:22:06 INFO : Start sync dropbox from  to alt live
2024/10/24 05:22:07 NOTICE: cngb_user: Skipped copy as --dry-run is set (size 23)
2024/10/24 05:22:07 NOTICE: nohup.out: Skipped copy as --dry-run is set (size 200.488Ki)
2024/10/24 05:22:07 NOTICE: some.file: Skipped copy as --dry-run is set (size 30)
2024/10/24 05:22:07 NOTICE: uploaded_with_sftp.txt: Skipped copy as --dry-run is set (size 168)
2024/10/24 05:22:07 NOTICE: user116.orig/Function.txt: Skipped copy as --dry-run is set (size 10.790Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.addgene.addUniprot.gff: Skipped copy as --dry-run is set (size 44.357Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.cds: Skipped copy as --dry-run is set (size 34.069Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.pep: Skipped copy as --dry-run is set (size 12.116Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/RAxML_bestTree.snail.PcPm.GQ10.mind2.NoRelate.gt5maf1.biallelic.NoSCFS.pruned.min4.nwk: Skipped copy as --dry-run is set (size 7.210Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/all.repeat.gff: Skipped copy as --dry-run is set (size 150.854Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/allfile.md5sum: Skipped copy as --dry-run is set (size 1.210Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/differentStress.TMM.EXPR.matrix: Skipped copy as --dry-run is set (size 2.293Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/full_table.tsv: Skipped copy as --dry-run is set (size 14.393Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/missing_busco_list.tsv: Skipped copy as --dry-run is set (size 166)
2024/10/24 05:22:07 NOTICE: user116.orig/pcan.chr.20190806.fa: Skipped copy as --dry-run is set (size 430.149Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/rRNA.gff: Skipped copy as --dry-run is set (size 11.354Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/nohup.out: Skipped copy as --dry-run is set (size 2.122Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/readme.txt: Skipped copy as --dry-run is set (size 3.288Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/sample_information.txt: Skipped copy as --dry-run is set (size 39.711Ki)
........
[centos@ip-10-99-0-185 ~]$ tail /var/log/gigadb/sync_dropbox.log 
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98261.1.fasta: Skipped copy as --dry-run is set (size 467)
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98262.1.fasta: Skipped copy as --dry-run is set (size 3.021Ki)
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z99268.2.fasta: Skipped copy as --dry-run is set (size 2.968Ki)
2024/10/24 05:30:44 NOTICE: 
Transferred:        8.789 TiB / 8.789 TiB, 100%, 81.082 GiB/s, ETA 0s
Transferred:       187034 / 187034, 100%
Elapsed time:        50.1s

2024/10/24 05:30:45 INFO  : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO  : Successfully sync dropbox from production-live to alt live

In live as a lily user

% ansible-playbook -i ../../inventories users_playbook.yml -e "newuser=lily" -e "credentials_csv_path=~/path/to/credentials.csv" -e "gigadb_env=live" 
% ssh -i output/privkeys-$live-bastion-ip/lily lily@$live-bastion-ip

Activate the web console with: systemctl enable --now cockpit.socket
[lily@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --dry-run
2024/10/24 05:50:47 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:51:50 INFO  : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:51:50 INFO  : Successfully sync dropbox from production-live to alt live
[lily@ip-10-99-0-185 ~]$ ls -al /var/log/gigadb/
total 166980
drwxrwxrwx.  2 centos centos        30 Oct 23 15:11 .
drwxr-xr-x. 11 root   root        4096 Oct 23 15:12 ..
-rw-rw-r--.  1 root   gigadb 105503276 Oct 24 05:51 sync_dropbox.log
[lily@ip-10-99-0-185 ~]$ cat 
2024/10/24 05:50:47 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:50:48 NOTICE: cngb_user: Skipped copy as --dry-run is set (size 23)
2024/10/24 05:50:48 NOTICE: some.file: Skipped copy as --dry-run is set (size 30)
2024/10/24 05:50:48 NOTICE: nohup.out: Skipped copy as --dry-run is set (size 200.488Ki)
2024/10/24 05:50:48 NOTICE: uploaded_with_sftp.txt: Skipped copy as --dry-run is set (size 168)
2024/10/24 05:50:48 NOTICE: user116.orig/Function.txt: Skipped copy as --dry-run is set (size 10.790Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.addgene.addUniprot.gff: Skipped copy as --dry-run is set (size 44.357Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.cds: Skipped copy as --dry-run is set (size 34.069Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.pep: Skipped copy as --dry-run is set (size 12.116Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/RAxML_bestTree.snail.PcPm.GQ10.mind2.NoRelate.gt5maf1.biallelic.NoSCFS.pruned.min4.nwk: Skipped copy as --dry-run is set (size 7.210Ki)
....
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98262.1.fasta: Skipped copy as --dry-run is set (size 3.021Ki)
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z95622.1.fasta: Skipped copy as --dry-run is set (size 2.314Ki)
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z99268.2.fasta: Skipped copy as --dry-run is set (size 2.968Ki)
2024/10/24 05:51:50 NOTICE: 
Transferred:        8.794 TiB / 8.794 TiB, 100%, 41.572 GiB/s, ETA 0s
Transferred:       187035 / 187035, 100%
Elapsed time:       1m2.8s

2024/10/24 05:51:50 INFO  : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:51:50 INFO  : Successfully sync dropbox from production-live to alt live

How have functionalities been implemented?

Describe how the new functionalities have been implemented by the changed code at a high level

The script sync_dropbox uses rclone sync to make sure the efs mount point /share/dropbox in alt upstream always stays as the same as the one found in the upstream. Rclone sync will only the source to the destination, changing the destination only.

The pre-defined rclone config for the sync is located at gigadb/app/tools/sync-dropbox/config/rclone.conf in the local dev encironment, and at /etc/sync_dropbox/rclone.conf in the production environments, which specifies the endpoints for production staging efs and the production live efs.

To access these endpoints through a secure connection, a ssh private key id-rsa-aws-hk-gigadb.pem needs to be pulled from the gitlab cnhk-infra variable page to the ~/.ssh/ directory, this private key is supposedly for the gigadb user to provision the upstream ec2 servers. By sharing this key, the efs in the upstream can be accessed from the alt upstream securely, otherwise the users in the alt upstream needs to copy the public key to the upstream server using this key id-rsa-aws-hk-gigadb.pem if they want to use their own ssh key pairs, which seems adding an extra layer of complication.

In order to get rid of the Permission denied error when the /usr/local/bin/sync_dropbox trying to output log in production environments, a gigadb group is created and make the /var/log/gigadb directory be r+w to the gigadb group, then add centos and other users, eg. lily to gigadb group, as a result, /var/log/gigadb can be r+w by both centos and other users.

Any issues with implementation?

None

Any changes to automated tests?

None

Any changes to documentation?

None

Any technical debt repayment?

logrotate has been implemented in this PR, the main config for the logrotate is at /etc/logrorate.d/gigadb,

/var/log/gigadb/*.log {  
            daily  # Rotate logs daily
            missingok             # Ignore missing log files
            rotate 7              # Keep 7 days of backlogs
            compress              # Compress rotated logs
            delaycompress         # Delay compression until the next rotation
            notifempty            # Do not rotate empty log files
            create 0640 root gigadb   # Create new log files with specified permissions and ownership
            dateformat -%Y%m%d  # Suffix of the rotated log file
          }

which indicates every log inside /var/log/gigadb will be rotated daily 7 times, each rotated log will be compressed with YYYYMMDD in the file name suffix, as you can see below:

% ssh -i ~/.ssh/id-rsa-aws-seoul-ken.pem centos@3.36.204.163
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Fri Sep 27 03:59:58 2024 from 223.197.187.121
[centos@ip-10-99-0-226 ~]$ ls -al /var/log/gigadb/
total 40
drwxr-xr-x.  2 root   gigadb   171 Sep 29 03:13 .
drwxr-xr-x. 10 root   root    4096 Sep 29 03:13 ..
-rw-rw-r--.  1 root gigadb 10450 Sep 30 01:00 sync_dropbox.log
-rw-rw-r--.  1 root gigadb    26 Sep 26 07:43 sync_dropbox.log-20240926.gz
-rw-rw-r--.  1 root   gigadb  3134 Sep 27 03:11 sync_dropbox.log-20240927.gz
-rw-rw-r--.  1 root gigadb   612 Sep 28 03:00 sync_dropbox.log-20240928.gz
-rw-rw-r--.  1 root gigadb 11400 Sep 29 03:00 sync_dropbox.log-20240929.gz

The logrotate requires the log directory and the log file to have a secure permission, so, the log can be read and written by root user and any user in the group gigadb.

For the purpose of testing, after logging into your staging, you can force rotate the log to check if it works correctly:

% ssh -i path/to/staging/pem centos@$staging-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Mon Sep 30 05:57:11 2024 from 3.36.204.163
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x.  2 root gigadb    30 Sep 30 04:23 .
drwxr-xr-x. 10 root root    4096 Sep 30 04:16 ..
-rw-rw-r--.  1 root gigadb 12410 Sep 30 05:59 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ tail /var/log/gigadb/sync_dropbox.log
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO  : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$  /usr/local/bin/sync_dropbox --apply
[centos@ip-10-99-0-22 ~]$ tail /var/log/gigadb/sync_dropbox.log
2024/09/30 05:59:27 INFO  : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO  : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO  : Successfully sync dropbox from production-staging to alt staging
2024/09/30 06:04:29 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 06:04:31 INFO  : There was nothing to transfer
2024/09/30 06:04:31 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 06:04:31 INFO  : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$ sudo /usr/sbin/logrotate -f /etc/logrotate.conf
[centos@ip-10-99-0-22 ~]$ ls /var/log/gigadb/
sync_dropbox.log  sync_dropbox.log-20240930
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x.  2 root gigadb    63 Sep 30 06:05 .
drwxr-xr-x. 10 root root    4096 Sep 30 06:05 ..
-rw-rw----.  1 root gigadb     0 Sep 30 06:24 sync_dropbox.log
-rw-rw-r--.  1 root gigadb 12885 Sep 30 06:04 sync_dropbox.log-20240930
# The old sync_dropbox.log has been rotated to sync_dropbox.log-20240930
# The current sync_dropbox.log is empty, all log output will be saved to here
# If the logrotation was done in daily manner, the log file with name `sync_dropbox.log-20240930.gz` will be created as because the option `delaycompress` is included in the logrorate config which dos not immediately compress the rotated log file, but only in the next rotation cycle 

Additionally, I have made my staging bastion running the script for more that 7 days to test for cron job and also the logrotation, here is the results:

[centos@ip-10-99-0-208 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x.  2 root gigadb  171 Oct 28 03:07 .
drwxr-xr-x. 11 root root   4096 Oct 27 03:37 ..
-rw-rw----.  1 root gigadb    0 Oct 28 03:07 sync_dropbox.log
-rw-rw-r--.  1 root gigadb  328 Oct 25 03:26 sync_dropbox.log-20241025.gz
-rw-rw----.  1 root gigadb  246 Oct 25 11:00 sync_dropbox.log-20241026.gz
-rw-rw----.  1 root gigadb  250 Oct 26 11:00 sync_dropbox.log-20241027.gz
-rw-rw----.  1 root gigadb  475 Oct 27 11:00 sync_dropbox.log-20241028
[centos@ip-10-99-0-208 ~]$ zcat /var/log/gigadb/sync_dropbox.log-20241027.gz
2024/10/26 11:00:02 INFO : Start sync dropbox from production-staging to alt staging
2024/10/26 11:00:04 INFO  : There was nothing to transfer
2024/10/26 11:00:05 INFO  : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/26 11:00:05 INFO  : Successfully sync dropbox from production-staging to alt staging
# perform checking on 4/11/2024
[centos@ip-10-99-0-208 ~]$ ls -al /var/log/gigadb/
total 36
drwxr-xr-x.  2 root gigadb 4096 Nov  4 04:20 .
drwxr-xr-x. 11 root root   4096 Nov  3 03:47 ..
-rw-rw----.  1 root gigadb    0 Nov  4 04:20 sync_dropbox.log
-rw-rw----.  1 root gigadb  250 Oct 28 11:01 sync_dropbox.log-20241029.gz
-rw-rw----.  1 root gigadb  250 Oct 29 11:00 sync_dropbox.log-20241030.gz
-rw-rw----.  1 root gigadb  249 Oct 30 11:00 sync_dropbox.log-20241031.gz
-rw-rw----.  1 root gigadb  246 Oct 31 11:00 sync_dropbox.log-20241101.gz
-rw-rw----.  1 root gigadb  246 Nov  1 11:00 sync_dropbox.log-20241102.gz
-rw-rw----.  1 root gigadb  246 Nov  2 11:00 sync_dropbox.log-20241103.gz
-rw-rw----.  1 root gigadb  475 Nov  3 11:00 sync_dropbox.log-20241104
[centos@ip-10-99-0-208 ~]$

As a result, it is suggested the all the script outputs should be stored in the /var/log/gigadb, so the log files will be better managed automatically.

Any improvements to CI/CD pipeline?

None