This is a pull request for the following functionalities:
[x] #2004
[x] #2005
[x] #2006
[x] #2007
How to test?
Describe how the new functionalities can be tested by PR reviewers
In local dev
% cd /gigadb/app/tools/sync-dropbox
# execute the configure script to generate .env
% ./configure
An .env file wasn't present, creating a new one from the default example
Sourcing .env
./.env: line 16: syntax error near unexpected token `newline'
# then fill in the gitlab private token and the gitlab repo name in the .env and execute the configure script again to generate the rclone.conf file and get the ssh private key for accessing upstream server
% ./configure
An .env file is present
Sourcing .env
Current environment: dev
# then check the existence of the privet key
% ls -al ~/.ssh/id-rsa-aws-hk-gigadb.pem
-rw-r--r--@ 1 kencho staff 1675 Sep 19 16:12 /Users/kencho/.ssh/id-rsa-aws-hk-gigadb.pem
# then execute the bats tests
% bats tests/bats/sync_dropbox.bats
sync_dropbox.bats
✓ No parameter provided
✓ Execute in dry run mode
✓ Execute in apply mode
3 tests, 0 failures
# the last test will take several minutes to complete, as it tries to copy the files from the current existing upstream staging efs to yours local dev environment
Pre-requisites
Follow the docs/SETUP_PROVISIONING.md to spin up servers
In staging as a centos user
% ssh -i path/to/staging/pem centos@$staging-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Mon Sep 30 04:36:53 2024 from 3.36.204.163
[centos@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 0
drwxrwxr-x. 2 centos centos 6 Sep 30 03:54 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 4
drwxr-xr-x. 2 root gigadb 30 Sep 30 04:23 .
drwxr-xr-x. 10 root root 4096 Sep 30 04:16 ..
-rw-rw-r--. 1 root gigadb 0 Sep 30 04:23 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ /usr/local/bin/sync_dropbox --apply
2024/09/30 05:50:24 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:51:10 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:51:10 INFO : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos 82 Sep 30 05:50 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos 166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos 36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 centos centos 167 Sep 30 05:50 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[centos@ip-10-99-0-22 ~]$
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 12
drwxr-xr-x. 2 root gigadb 30 Sep 30 04:23 .
drwxr-xr-x. 10 root root 4096 Sep 30 04:16 ..
-rw-rw-r--. 1 root gigadb 7346 Sep 30 05:51 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ cat /var/log/gigadb/sync_dropbox.log
...
2024/09/30 05:50:35 INFO : user27/Genome/Sample_information.csv: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/busco_full_table.csv: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/Venny_FigS3.zip: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/Binodoxys_communis_contig_level.fa: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/busco_short_summary.txt: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/SpeciesTreeAlignment.fa: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/missing_busco_list.csv: Copied (new)
2024/09/30 05:50:35 INFO : user27/Genome/repeat.statistics.csv: Copied (new)
2024/09/30 05:50:36 INFO : user27/Genome/repeatmasker.gff: Copied (new)
2024/09/30 05:50:36 INFO : user27/Genome/trf.gff: Copied (new)
2024/09/30 05:50:36 INFO : user4/brassicaceae_NCBI/Trees.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO : user27/Genome/single_copy.pep.phy: Copied (new)
2024/09/30 05:50:37 INFO : user4/brassicaceae_NCBI/its.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO : user4/brassicaceae_NCBI/amas.tar.gz: Copied (new)
2024/09/30 05:50:37 INFO : user4/brassicaceae_NCBI/its_id_sp.csv: Copied (new)
2024/09/30 05:50:37 INFO : user4/brassicaceae_NCBI/matk_id_spp.csv: Copied (new)
2024/09/30 05:50:39 INFO : user27/Genome/single_copy.cds.phy: Copied (new)
2024/09/30 05:50:39 INFO : user4/brassicaceae_NCBI/rbcl_id_spp.csv: Copied (new)
2024/09/30 05:50:41 INFO : user4/brassicaceae_NCBI/trn.tar.gz: Copied (new)
2024/09/30 05:50:41 INFO : user4/brassicaceae_NCBI/trn_id_spp.csv: Copied (new)
2024/09/30 05:50:41 INFO : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:50:42 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:50:43 INFO : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:50:43 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:50:43 INFO : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:50:44 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_N.png: Copied (new)
2024/09/30 05:50:44 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_P.png: Copied (new)
2024/09/30 05:50:45 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_N.png: Copied (new)
2024/09/30 05:50:45 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_P.png: Copied (new)
2024/09/30 05:50:46 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_N.png: Copied (new)
2024/09/30 05:50:46 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_P.png: Copied (new)
2024/09/30 05:50:47 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_N.png: Copied (new)
2024/09/30 05:50:47 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_P.png: Copied (new)
2024/09/30 05:50:47 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:50:48 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:50:49 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:50:49 INFO : user0/change.log: Copied (new)
2024/09/30 05:50:49 INFO : user0/some/directory/foobar.ext: Copied (new)
2024/09/30 05:50:54 INFO : user4/brassicaceae_NCBI/rbcl.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:07 INFO : user4/brassicaceae_NCBI/outputs.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:09 INFO : user4/brassicaceae_NCBI/matk.tar.gz: Multi-thread Copied (new)
2024/09/30 05:51:10 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:51:10 INFO : Successfully sync dropbox from production-staging to alt staging
In staging as a lily user
% cd ops/infrastructure/envs/staging
% ansible-playbook -i ../../inventories users_playbook.yml -e "newuser=lily" -e "credentials_csv_path=~/path/to/credentials.csv" -e "gigadb_env=staging"
% chmod 500 output/privkeys-$bastion-ip/lily
% ssh -i output/privkeys-3.36.204.163/lily lily@$bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket
[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos 82 Sep 30 05:50 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos 166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos 36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 centos centos 167 Sep 30 05:50 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ rm -r /share/dropbox/user27
[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 5 centos centos 68 Sep 30 05:58 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos 166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos 36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 12
drwxr-xr-x. 2 root gigadb 30 Sep 30 04:23 .
drwxr-xr-x. 10 root root 4096 Sep 30 04:16 ..
-rw-rw-r--. 1 root gigadb 7346 Sep 30 05:51 sync_dropbox.log
[lily@ip-10-99-0-22 ~]$ /usr/local/bin/sync_dropbox --apply
2024/09/30 05:59:16 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:59:28 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO : Successfully sync dropbox from production-staging to alt staging
[lily@ip-10-99-0-22 ~]$ ls -al /share/dropbox/
total 12
drwxrwxr-x. 6 centos centos 82 Sep 30 05:59 .
drwxr-xr-x. 4 centos centos 35 Sep 30 03:54 ..
-rw-rw-r--. 1 centos centos 166 Apr 25 16:55 rija_test.txt
drwxrwxr-x. 3 centos centos 36 Sep 30 05:50 user0
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user109
drwxrwxr-x. 4 lily lily 167 Sep 30 05:59 user27
drwxrwxr-x. 3 centos centos 4096 Sep 30 05:50 user4
[lily@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x. 2 root gigadb 30 Sep 30 04:23 .
drwxr-xr-x. 10 root root 4096 Sep 30 04:16 ..
-rw-rw-r--. 1 root gigadb 12410 Sep 30 05:59 sync_dropbox.log
...
2024/09/30 05:59:16 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 05:59:19 INFO : user27/102224.filesizes: Copied (new)
2024/09/30 05:59:19 INFO : user27/.dotfiles.txt: Copied (new)
2024/09/30 05:59:19 INFO : user27/102224.filesizes.bk: Copied (new)
2024/09/30 05:59:19 INFO : user27/102224.md5.bk: Copied (new)
2024/09/30 05:59:19 INFO : user27/102224.md5: Copied (new)
2024/09/30 05:59:19 INFO : user27/readme.txt: Copied (new)
2024/09/30 05:59:20 INFO : user27/Genome/Binodoxys_communis_chr_pep.fa: Copied (new)
2024/09/30 05:59:20 INFO : user27/Genome/Binodoxys_communis_chr.gff: Copied (new)
2024/09/30 05:59:20 INFO : user27/Genome/Expasion_KEGG_Fig2BC.csv: Copied (new)
2024/09/30 05:59:20 INFO : user27/Genome/Binodoxys_communis_chr_cds.fa: Copied (new)
2024/09/30 05:59:20 INFO : user27/Genome/Extraction_KEGG_Fig2DE.csv: Copied (new)
2024/09/30 05:59:21 INFO : user27/Genome/Genome.md5: Copied (new)
2024/09/30 05:59:21 INFO : user27/Genome/Gene_annotation.csv: Copied (new)
2024/09/30 05:59:21 INFO : user27/Genome/Phylogenetic_tree_Fig2A.newick: Copied (new)
2024/09/30 05:59:21 INFO : user27/Genome/Sample_information.csv: Copied (new)
2024/09/30 05:59:22 INFO : user27/Genome/Venny_FigS3.zip: Copied (new)
2024/09/30 05:59:22 INFO : user27/Genome/SpeciesTreeAlignment.fa: Copied (new)
2024/09/30 05:59:22 INFO : user27/Genome/busco_full_table.csv: Copied (new)
2024/09/30 05:59:23 INFO : user27/Genome/busco_short_summary.txt: Copied (new)
2024/09/30 05:59:23 INFO : user27/Genome/missing_busco_list.csv: Copied (new)
2024/09/30 05:59:23 INFO : user27/Genome/repeat.statistics.csv: Copied (new)
2024/09/30 05:59:23 INFO : user27/Genome/repeatmasker.gff: Copied (new)
2024/09/30 05:59:24 INFO : user27/Genome/Binodoxys_communis_chrom_level.fa: Copied (new)
2024/09/30 05:59:25 INFO : user27/Genome/single_copy.pep.phy: Copied (new)
2024/09/30 05:59:25 INFO : user27/Metabolite/Fig4BCDEF_Lipid_changes_in_adult_stage.csv: Copied (new)
2024/09/30 05:59:25 INFO : user27/Genome/Binodoxys_communis_contig_level.fa: Copied (new)
2024/09/30 05:59:25 INFO : user27/Genome/trf.gff: Copied (new)
2024/09/30 05:59:25 INFO : user27/Metabolite/Fig6BC_Lipid_changes_in_the_larval_stage.csv: Copied (new)
2024/09/30 05:59:25 INFO : user27/Metabolite/Fig7ABC_Biological data.csv: Copied (new)
2024/09/30 05:59:25 INFO : user27/Genome/single_copy.cds.phy: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Fig7DEF_TAG_DAG_and_FA_of_cotton_aphids_after_parasitization.csv: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Fig9BD_Flaw rates of isotope-labeled metabolites.csv: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/FigS5_Quantitative_validation_of_metabolites.csv: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Metabolite.md5: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_N.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Integral_correction_diagram_P.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_N.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/MRM_detection_of_multimodal_maps_P.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_N.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_N.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_TIC_P.png: Copied (new)
2024/09/30 05:59:26 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/QC_MS_tic_overlap_P.png: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO : Successfully sync dropbox from production-staging to alt staging
[lily@ip-10-99-0-22 ~]$
In live as a centos user
% ssh -i path/to/live/pem centos@$live-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Thu Oct 24 05:29:34 2024 from 54.180.33.208
[centos@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --dry-run
2024/10/24 05:29:54 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:30:45 INFO : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO : Successfully sync dropbox from production-live to alt live
[centos@ip-10-99-0-185 ~]$
[centos@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --apply
2024/10/24 05:29:54 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:30:45 INFO : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO : Successfully sync dropbox from production-live to alt live
[centos@ip-10-99-0-185 ~]$ ls -al /var/log/gigadb/sync_dropbox.log
total 97796
drwxrwxrwx. 2 centos centos 30 Oct 23 15:11 .
drwxr-xr-x. 11 root root 4096 Oct 23 15:12 ..
-rw-rw-r--. 1 root gigadb 70335364 Oct 24 05:30 sync_dropbox.log
[centos@ip-10-99-0-185 ~]$ wc -l /var/log/gigadb/sync_dropbox.log
374089 /var/log/gigadb/sync_dropbox.log
[centos@ip-10-99-0-185 ~]$ less var/log/gigadb/sync_dropbox.log
2024/10/24 05:22:06 INFO : Start sync dropbox from to alt live
2024/10/24 05:22:07 NOTICE: cngb_user: Skipped copy as --dry-run is set (size 23)
2024/10/24 05:22:07 NOTICE: nohup.out: Skipped copy as --dry-run is set (size 200.488Ki)
2024/10/24 05:22:07 NOTICE: some.file: Skipped copy as --dry-run is set (size 30)
2024/10/24 05:22:07 NOTICE: uploaded_with_sftp.txt: Skipped copy as --dry-run is set (size 168)
2024/10/24 05:22:07 NOTICE: user116.orig/Function.txt: Skipped copy as --dry-run is set (size 10.790Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.addgene.addUniprot.gff: Skipped copy as --dry-run is set (size 44.357Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.cds: Skipped copy as --dry-run is set (size 34.069Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.pep: Skipped copy as --dry-run is set (size 12.116Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/RAxML_bestTree.snail.PcPm.GQ10.mind2.NoRelate.gt5maf1.biallelic.NoSCFS.pruned.min4.nwk: Skipped copy as --dry-run is set (size 7.210Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/all.repeat.gff: Skipped copy as --dry-run is set (size 150.854Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/allfile.md5sum: Skipped copy as --dry-run is set (size 1.210Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/differentStress.TMM.EXPR.matrix: Skipped copy as --dry-run is set (size 2.293Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/full_table.tsv: Skipped copy as --dry-run is set (size 14.393Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/missing_busco_list.tsv: Skipped copy as --dry-run is set (size 166)
2024/10/24 05:22:07 NOTICE: user116.orig/pcan.chr.20190806.fa: Skipped copy as --dry-run is set (size 430.149Mi)
2024/10/24 05:22:07 NOTICE: user116.orig/rRNA.gff: Skipped copy as --dry-run is set (size 11.354Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/nohup.out: Skipped copy as --dry-run is set (size 2.122Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/readme.txt: Skipped copy as --dry-run is set (size 3.288Ki)
2024/10/24 05:22:07 NOTICE: user116.orig/sample_information.txt: Skipped copy as --dry-run is set (size 39.711Ki)
........
[centos@ip-10-99-0-185 ~]$ tail /var/log/gigadb/sync_dropbox.log
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98261.1.fasta: Skipped copy as --dry-run is set (size 467)
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98262.1.fasta: Skipped copy as --dry-run is set (size 3.021Ki)
2024/10/24 05:30:44 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z99268.2.fasta: Skipped copy as --dry-run is set (size 2.968Ki)
2024/10/24 05:30:44 NOTICE:
Transferred: 8.789 TiB / 8.789 TiB, 100%, 81.082 GiB/s, ETA 0s
Transferred: 187034 / 187034, 100%
Elapsed time: 50.1s
2024/10/24 05:30:45 INFO : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:30:45 INFO : Successfully sync dropbox from production-live to alt live
In live as a lily user
% ansible-playbook -i ../../inventories users_playbook.yml -e "newuser=lily" -e "credentials_csv_path=~/path/to/credentials.csv" -e "gigadb_env=live"
% ssh -i output/privkeys-$live-bastion-ip/lily lily@$live-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket
[lily@ip-10-99-0-185 ~]$ /usr/local/bin/sync_dropbox --dry-run
2024/10/24 05:50:47 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:51:50 INFO : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:51:50 INFO : Successfully sync dropbox from production-live to alt live
[lily@ip-10-99-0-185 ~]$ ls -al /var/log/gigadb/
total 166980
drwxrwxrwx. 2 centos centos 30 Oct 23 15:11 .
drwxr-xr-x. 11 root root 4096 Oct 23 15:12 ..
-rw-rw-r--. 1 root gigadb 105503276 Oct 24 05:51 sync_dropbox.log
[lily@ip-10-99-0-185 ~]$ cat
2024/10/24 05:50:47 INFO : Start sync dropbox from production-live to alt live
2024/10/24 05:50:48 NOTICE: cngb_user: Skipped copy as --dry-run is set (size 23)
2024/10/24 05:50:48 NOTICE: some.file: Skipped copy as --dry-run is set (size 30)
2024/10/24 05:50:48 NOTICE: nohup.out: Skipped copy as --dry-run is set (size 200.488Ki)
2024/10/24 05:50:48 NOTICE: uploaded_with_sftp.txt: Skipped copy as --dry-run is set (size 168)
2024/10/24 05:50:48 NOTICE: user116.orig/Function.txt: Skipped copy as --dry-run is set (size 10.790Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.addgene.addUniprot.gff: Skipped copy as --dry-run is set (size 44.357Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.cds: Skipped copy as --dry-run is set (size 34.069Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/Pomacea_canaliculata.coding.gene.V1.20190806.pep: Skipped copy as --dry-run is set (size 12.116Mi)
2024/10/24 05:50:48 NOTICE: user116.orig/RAxML_bestTree.snail.PcPm.GQ10.mind2.NoRelate.gt5maf1.biallelic.NoSCFS.pruned.min4.nwk: Skipped copy as --dry-run is set (size 7.210Ki)
....
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z98262.1.fasta: Skipped copy as --dry-run is set (size 3.021Ki)
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z95622.1.fasta: Skipped copy as --dry-run is set (size 2.314Ki)
2024/10/24 05:51:50 NOTICE: user78.orig/user78/MOBFinder/4.model_training_and_optimization/1.plasmid_genomes_for_different_MOB_types/non-mob/Z99268.2.fasta: Skipped copy as --dry-run is set (size 2.968Ki)
2024/10/24 05:51:50 NOTICE:
Transferred: 8.794 TiB / 8.794 TiB, 100%, 41.572 GiB/s, ETA 0s
Transferred: 187035 / 187035, 100%
Elapsed time: 1m2.8s
2024/10/24 05:51:50 INFO : Executed: /usr/local/bin/rclone sync production-live:/share/dropbox/ /share/dropbox --dry-run --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/24 05:51:50 INFO : Successfully sync dropbox from production-live to alt live
How have functionalities been implemented?
Describe how the new functionalities have been implemented by the
changed code at a high level
The script sync_dropbox uses rclone sync to make sure the efs mount point /share/dropbox in alt upstream always stays as the same as the one found in the upstream. Rclone sync will only the source to the destination, changing the destination only.
The pre-defined rclone config for the sync is located at gigadb/app/tools/sync-dropbox/config/rclone.conf in the local dev encironment, and at /etc/sync_dropbox/rclone.conf in the production environments, which specifies the endpoints for production staging efs and the production live efs.
To access these endpoints through a secure connection, a ssh private key id-rsa-aws-hk-gigadb.pem needs to be pulled from the gitlab cnhk-infra variable page to the ~/.ssh/ directory, this private key is supposedly for the gigadb user to provision the upstream ec2 servers. By sharing this key, the efs in the upstream can be accessed from the alt upstream securely, otherwise the users in the alt upstream needs to copy the public key to the upstream server using this key id-rsa-aws-hk-gigadb.pem if they want to use their own ssh key pairs, which seems adding an extra layer of complication.
In order to get rid of the Permission denied error when the /usr/local/bin/sync_dropbox trying to output log in production environments, a gigadb group is created and make the /var/log/gigadb directory be r+w to the gigadb group, then add centos and other users, eg. lily to gigadb group, as a result, /var/log/gigadb can be r+w by both centos and other users.
Any issues with implementation?
None
Any changes to automated tests?
None
Any changes to documentation?
None
Any technical debt repayment?
logrotate has been implemented in this PR, the main config for the logrotate is at /etc/logrorate.d/gigadb,
/var/log/gigadb/*.log {
daily # Rotate logs daily
missingok # Ignore missing log files
rotate 7 # Keep 7 days of backlogs
compress # Compress rotated logs
delaycompress # Delay compression until the next rotation
notifempty # Do not rotate empty log files
create 0640 root gigadb # Create new log files with specified permissions and ownership
dateformat -%Y%m%d # Suffix of the rotated log file
}
which indicates every log inside /var/log/gigadb will be rotated daily 7 times, each rotated log will be compressed with YYYYMMDD in the file name suffix, as you can see below:
The logrotate requires the log directory and the log file to have a secure permission, so, the log can be read and written by root user and any user in the group gigadb.
For the purpose of testing, after logging into your staging, you can force rotate the log to check if it works correctly:
% ssh -i path/to/staging/pem centos@$staging-bastion-ip
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Mon Sep 30 05:57:11 2024 from 3.36.204.163
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x. 2 root gigadb 30 Sep 30 04:23 .
drwxr-xr-x. 10 root root 4096 Sep 30 04:16 ..
-rw-rw-r--. 1 root gigadb 12410 Sep 30 05:59 sync_dropbox.log
[centos@ip-10-99-0-22 ~]$ tail /var/log/gigadb/sync_dropbox.log
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Raw_metabolomics_data.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/Sample_information2.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Lipidomics_Rawdata_of_parasitic_wasps/all_cor.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Fig7DEF_Rawdata/Sample_information1.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$ /usr/local/bin/sync_dropbox --apply
[centos@ip-10-99-0-22 ~]$ tail /var/log/gigadb/sync_dropbox.log
2024/09/30 05:59:27 INFO : user27/Metabolite/Fig7DEF_Rawdata/TOTAL_baseCompare.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results1.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/C13_flux_results2.csv: Copied (new)
2024/09/30 05:59:27 INFO : user27/Metabolite/Rawdata_of_Isotope_test/Sample_information3.csv: Copied (new)
2024/09/30 05:59:28 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 05:59:28 INFO : Successfully sync dropbox from production-staging to alt staging
2024/09/30 06:04:29 INFO : Start sync dropbox from production-staging to alt staging
2024/09/30 06:04:31 INFO : There was nothing to transfer
2024/09/30 06:04:31 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/09/30 06:04:31 INFO : Successfully sync dropbox from production-staging to alt staging
[centos@ip-10-99-0-22 ~]$ sudo /usr/sbin/logrotate -f /etc/logrotate.conf
[centos@ip-10-99-0-22 ~]$ ls /var/log/gigadb/
sync_dropbox.log sync_dropbox.log-20240930
[centos@ip-10-99-0-22 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x. 2 root gigadb 63 Sep 30 06:05 .
drwxr-xr-x. 10 root root 4096 Sep 30 06:05 ..
-rw-rw----. 1 root gigadb 0 Sep 30 06:24 sync_dropbox.log
-rw-rw-r--. 1 root gigadb 12885 Sep 30 06:04 sync_dropbox.log-20240930
# The old sync_dropbox.log has been rotated to sync_dropbox.log-20240930
# The current sync_dropbox.log is empty, all log output will be saved to here
# If the logrotation was done in daily manner, the log file with name `sync_dropbox.log-20240930.gz` will be created as because the option `delaycompress` is included in the logrorate config which dos not immediately compress the rotated log file, but only in the next rotation cycle
Additionally, I have made my staging bastion running the script for more that 7 days to test for cron job and also the logrotation, here is the results:
[centos@ip-10-99-0-208 ~]$ ls -al /var/log/gigadb/
total 20
drwxr-xr-x. 2 root gigadb 171 Oct 28 03:07 .
drwxr-xr-x. 11 root root 4096 Oct 27 03:37 ..
-rw-rw----. 1 root gigadb 0 Oct 28 03:07 sync_dropbox.log
-rw-rw-r--. 1 root gigadb 328 Oct 25 03:26 sync_dropbox.log-20241025.gz
-rw-rw----. 1 root gigadb 246 Oct 25 11:00 sync_dropbox.log-20241026.gz
-rw-rw----. 1 root gigadb 250 Oct 26 11:00 sync_dropbox.log-20241027.gz
-rw-rw----. 1 root gigadb 475 Oct 27 11:00 sync_dropbox.log-20241028
[centos@ip-10-99-0-208 ~]$ zcat /var/log/gigadb/sync_dropbox.log-20241027.gz
2024/10/26 11:00:02 INFO : Start sync dropbox from production-staging to alt staging
2024/10/26 11:00:04 INFO : There was nothing to transfer
2024/10/26 11:00:05 INFO : Executed: /usr/local/bin/rclone sync production-staging:/share/dropbox/ /share/dropbox --config /etc/sync_dropbox/rclone.conf --log-file /var/log/gigadb/sync_dropbox.log --log-level INFO --stats-log-level DEBUG
2024/10/26 11:00:05 INFO : Successfully sync dropbox from production-staging to alt staging
# perform checking on 4/11/2024
[centos@ip-10-99-0-208 ~]$ ls -al /var/log/gigadb/
total 36
drwxr-xr-x. 2 root gigadb 4096 Nov 4 04:20 .
drwxr-xr-x. 11 root root 4096 Nov 3 03:47 ..
-rw-rw----. 1 root gigadb 0 Nov 4 04:20 sync_dropbox.log
-rw-rw----. 1 root gigadb 250 Oct 28 11:01 sync_dropbox.log-20241029.gz
-rw-rw----. 1 root gigadb 250 Oct 29 11:00 sync_dropbox.log-20241030.gz
-rw-rw----. 1 root gigadb 249 Oct 30 11:00 sync_dropbox.log-20241031.gz
-rw-rw----. 1 root gigadb 246 Oct 31 11:00 sync_dropbox.log-20241101.gz
-rw-rw----. 1 root gigadb 246 Nov 1 11:00 sync_dropbox.log-20241102.gz
-rw-rw----. 1 root gigadb 246 Nov 2 11:00 sync_dropbox.log-20241103.gz
-rw-rw----. 1 root gigadb 475 Nov 3 11:00 sync_dropbox.log-20241104
[centos@ip-10-99-0-208 ~]$
As a result, it is suggested the all the script outputs should be stored in the /var/log/gigadb, so the log files will be better managed automatically.
Pull request for issue: #2000
This is a pull request for the following functionalities:
How to test?
Describe how the new functionalities can be tested by PR reviewers
In local dev
Pre-requisites
In staging as a centos user
In staging as a lily user
In live as a centos user
In live as a lily user
How have functionalities been implemented?
Describe how the new functionalities have been implemented by the changed code at a high level
The script
sync_dropbox
uses rclone sync to make sure the efs mount point/share/dropbox
in alt upstream always stays as the same as the one found in the upstream. Rclone sync will only the source to the destination, changing the destination only.The pre-defined rclone config for the sync is located at
gigadb/app/tools/sync-dropbox/config/rclone.conf
in the local dev encironment, and at/etc/sync_dropbox/rclone.conf
in the production environments, which specifies the endpoints for production staging efs and the production live efs.To access these endpoints through a secure connection, a ssh private key
id-rsa-aws-hk-gigadb.pem
needs to be pulled from the gitlab cnhk-infra variable page to the~/.ssh/
directory, this private key is supposedly for thegigadb
user to provision the upstream ec2 servers. By sharing this key, the efs in the upstream can be accessed from the alt upstream securely, otherwise the users in the alt upstream needs to copy the public key to the upstream server using this keyid-rsa-aws-hk-gigadb.pem
if they want to use their own ssh key pairs, which seems adding an extra layer of complication.In order to get rid of the
Permission denied
error when the/usr/local/bin/sync_dropbox
trying to output log in production environments, agigadb
group is created and make the/var/log/gigadb
directory ber+w
to thegigadb
group, then add centos and other users, eg. lily togigadb
group, as a result,/var/log/gigadb
can ber+w
by both centos and other users.Any issues with implementation?
None
Any changes to automated tests?
None
Any changes to documentation?
None
Any technical debt repayment?
logrotate
has been implemented in this PR, the main config for the logrotate is at/etc/logrorate.d/gigadb
,which indicates every log inside
/var/log/gigadb
will be rotated daily 7 times, each rotated log will be compressed withYYYYMMDD
in the file name suffix, as you can see below:The
logrotate
requires the log directory and the log file to have a secure permission, so, the log can be read and written byroot
user and any user in the groupgigadb
.For the purpose of testing, after logging into your staging, you can force rotate the log to check if it works correctly:
Additionally, I have made my staging bastion running the script for more that 7 days to test for cron job and also the logrotation, here is the results:
As a result, it is suggested the all the script outputs should be stored in the
/var/log/gigadb
, so the log files will be better managed automatically.Any improvements to CI/CD pipeline?
None