NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

CustomizedQC: Move data from biowulf to object storage system (S3) #22

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago

Since the primary storage space is not enough, we need to move the data from "biowulf" to S3

lxwgcool commented 3 years ago

Valuable command line related to object storage system

  1. Check vault

    lix33@biowulf:~$ obj_df -h
    Name                             Quota        Used         Remaining   
    DCEG_COVID_WGS                   290.00 TB    18.53 MB     290.00 TB
  2. Copy data

    
    (1) dry run
    obj_put -v DCEG_COVID_WGS -p "lix33/Test/ObjStorage" /home/lix33/lxwg/Test/object_storage --recursive --dry-run -FV

(2) wet run obj_put -v DCEG_COVID_WGS -p "lix33/Test/ObjStorage" /home/lix33/lxwg/Test/object_storage --recursive -FV


3. List files

obj_ls -v DCEG_COVID_WGS -m 'DirTest' -h obj_ls -v DCEG_COVID_WGS -h


4. Remove data

obj_rm -v DCEG_COVID_WGS --dry-run lix33/Test/ObjStorage/DirTest/object_storage/Test?/??????.txt -V Notice: do not support wildcard '*'

Remove the file shared with same pattern obj_ls -v DCEG_COVID_WGS -h -m "XXXXXX_XXXX_H3NNNDSX2*"| tail -n 25 | awk -F ' ' '{print $8}' | while read line; do obj_rm -v DCEG_COVID_WGS --dry-run ${line} -V; done | less -SN


### Valuable email from Tim (Biowulf Administrator)

Hi Xin,

The object storage scripts are designed for use only with the HPC object storage system. You cannot use them to send or retrieve data from Amazon S3.

If you're using them to connect to the HPC object storage, there is a list of scripts available at https://hpc.nih.gov/storage/object.html#object_scripts , and each script supports a --help option that explains basic usage.

We also have slides from a class I teach periodically on object storage usage available at https://hpc.nih.gov/training/handouts/object_storage_class_2018_oct.pdf

Please review this material and let me know if you have further questions. I'm glad to answer questions via e-mail or schedule a quick demo via WebEx if needed.

lxwgcool commented 3 years ago

New code: Backup2S3.py

  1. Back up the finished flowcells in biowulf target root dir to Object Storage system
  2. Scan all flowcells in target root dir
  3. How to check if a flowcell is all set
    • (1) Done flag is existing
    • (2) QC report has been generated
  4. Use the customized string of "Path" as prefix in S3