GTAC-MGI / GTAC-ESP-LIMS

3 stars 0 forks source link

Seq Lab: Add AWS Archive Process to staging #506

Open LMT337 opened 5 days ago

LMT337 commented 5 days ago

Update staging scripts to kick off an archive process to copy stuff to AWS (asynchronous).

LMT337 commented 5 days ago

Script from aemory: /storage1/fs1/gtac-mgi/Active/Bioinformatics_ops/scripts/send_data_to_s3.py

usage: send_data_to_s3.py [-h] [-a ACL] -b BUCKET_NAME [-bp BUCKET_PATH] [-c AWS_CREDENTIALS] [-config AWS_CONFIG] [-g JOB_GROUP_NAME] [-nl NO_SYMLINKS] -p PATH [-r RANDOM_STRING] [-s SHOULD_MONITOR] [-w WOID]

This script wites and BSUBs an aws s3 sync script.

optional arguments: -h, --help show this help message and exit -a ACL, --acl ACL the ACL rule -b BUCKET_NAME, --bucket_name BUCKET_NAME the bucket_name -bp BUCKET_PATH, --bucket_path BUCKET_PATH the bucket_path -c AWS_CREDENTIALS, --aws_credentials AWS_CREDENTIALS path to the AWS credentials file to be used for the s3 upload -config AWS_CONFIG, --aws_config AWS_CONFIG path to the AWS config file to be used for the s3 upload -g JOB_GROUP_NAME, --job_group_name JOB_GROUP_NAME the name of an existing job group to which this job will be added -nl NO_SYMLINKS, --no_symlinks NO_SYMLINKS "True" if upload script should NOT follow symlinks -p PATH, --path PATH the direcotry to upload -r RANDOM_STRING, --random_string RANDOM_STRING the random_string -s SHOULD_MONITOR, --should_monitor SHOULD_MONITOR "True" if script should_monitor the BSUB job -w WOID, --woid WOID the WOID

LMT337 commented 5 days ago

bucket: gtac-mgi-dt-archive-bucket

The only arguments you have to worry about are: -b -bp -p -w The defaults will take care of the rest. w is just an identifier. I just use the SR.

Run example: /storage1/fs1/gtac-mgi/Active/Bioinformatics_ops/scripts/send_data_to_s3.py -b gtac-mgi-dt-archive-bucket -bp SR00000 -p /storage1/fs1/gtac-mgi/Active/Bioinformatics_ops/datatransfer_staging/SR00000/ -w SR00000

That will write and BSUB a bash script that invokes AWS S3 SYNC w/ my credentials. And it will return after it launches the job (in other words the calling process will not wait for the job to complete unless you pass “True” for “-SHOULD_MONITOR” 10:51 It will write BSUB E/O files here: /storage1/fs1/gtac-mgi/Active/Bioinformatics_ops/BSUB/AWS_S3/

LMT337 commented 5 days ago

run tests on a test directory: /storage1/fs1/gtac-mgi/Active/Bioinformatics_ops/testing/test/