NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

COVID 19 project: Multiple changes and modifications #50

Open lxwgcool opened 2 years ago

lxwgcool commented 2 years ago

Since the beginning of Nov, there are multiple changes in number of phase of the code, please check the details below:

1: global_config_bash.rc (1) Create the variable "DATA_ROOT_DIR" to make the path easier to be presented.

2: global_config_bash_production.rc (1) Same as "global_config_bash.rc"

3: pre_calling_wgsqc_single.sh (1) Solve a bug a) For the case of multiple reads in one sample (e.g. topoff sample), three indicators are calculated in correctly, including i) TOTAL_READS ii) TOTAL_BASES iii) TOTAL_READS_WITH_3LESS_CUT b) This error is cased by awk command line, I have fixed it.

4: step8_sync_and_recalibrate_bam.sh 1) Add "strKTName" as an additional argument. 2) Change the coding layout 3) Create "inqueueDir". 4) Add some comments and tmp print info

5: AutoFramework.py 1) put mimic strManifestFile into the subfolder named with keytable name

6: CustomizedQC.py 1) Transfer TMPFolder from scratch to data dir a) This is because we need enough space to do some samtools functions (s.g. markduplicate) for the large size reads 2) Change the thresholds to dynamic load jobs. b) this is because we have more data right now, and wen also do some variant calling in the same folder.

7: Backup2S3.py 1) This is a completely new code. The function of this code is back up the finished flowcells in biowulf target root dir to Object Storage system

8: BackupCovid2ndPipelineResults.py 1) "secondary_buf/coverage_report" has multiple level right now. Need to change the way of backup. 2) Add a new function "RemoveFileFromBiowulf" 3) Add a new functuon "BackupFileFromBiowulf2S3"

9: RetrieveBAMFromS3.py (1) New code: retrieve data from S3 to Biowulf (primary pipeline analysis result: recalibrated BAM)

10: RetrieveUSUBAM.sh (1) This is a new code which is associated with "RetrieveBAMFromS3.py"

11: ObjectStorage/job.sh (1) New code: The jobs that is used to run "Backup2S3.py"

12: ContaminationCheckSingle.sh (1) Add "set -o pipefail" to exit the code when something wrong in the middle

13: CombineReport.py (1) Add a new logic to merge three types of report, including a) coverage_report b) pre_calling_qc_report c) ContaminationReport

14: MergeSubject.py (1) Add a function to back up the mimic manifest file to Batch Root Dir