NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

COVID_2nd_Pipeline: the essential code from 2nd pipeline that be used for COVID project #37

Open lxwgcool opened 2 years ago

lxwgcool commented 2 years ago

The code from 2nd pipeline

lxwgcool commented 2 years ago

Latest modification based on the original code from 2nd pipeline

1: gatk_build_bam_for_single_name_v4.sh

(1) Change dependences a) samtools/1.8 b) java/1.8.0_211 (2) Change Warning message: use variable to replace the fixed dir (3) Change the syntax to call: Picard DownsampleSam (4) Change the writing style of the code (5) Add addiitonal info in log file (6) Change the syntax to call: Picard MarkDuplicates

2: generate_coverage_report_single.sh

(1) change the way to call "global_config_bash.rc"

3: Add the real file of "global_config_bash_production.rc"

(1) in CCAD version, it links to another file in another project, which totally does not make any sense.

4: global_config_bash_production

(1) Change a lot of related path from CCAD to biowulf (2) Transplant many related reference files, standard variant vcf files from CCAD to Biowulf.

5: pre_calling_qc_single.sh

(1) Load new version of module file in biowulf (2) Change the way to call global_config_bash.rc (3) Change the way to use many "capture.interval" files. (4) Change the syntax to call Picard BedToIntervalList (5) Change the syntax to call Picard CollectMultipleMetrics (6) Change the syntax to call Picard ReorderSam (7) Change the syntax to call Picard QualityScoreDistribution (8) Change the syntax to call Picard CollectHsMetrics (9) Change the syntax to call GATK a) CallableLoci b) CountTerminusEvent c) ReadLengthDistribution (10)Change the syntax to call Picard CollectInsertSizeMetrics

6: recalibrate_bam.sh

(1) Load new version of module file in biowulf (2) Add working and done flag machenism (3) Change the syntax to call GATK a) LeftAlignIndels b) RealignerTargetCreator c) IndelRealigner d) BaseRecalibrator e) PrintReads (4) Update data the logic of auto delete/backup files

7: step5_2_generate_coverage_report_batch.sh

(1) Change the way to call global_config_bash.rc (2) Add additional valuable info in output

8: step5_generate_coverage_report_batch.sh

(1) Add some addiitonal arguments (2) Change the way to submit jobs (SLURM)

9: step6_2_generating_pre_calling_qc_report_batch.sh

(1) Change the way to call global_config_bash.rc

10: step6_generate_pre_calling_qc_report_batch.sh

(1) Add some addiitonal arguments (2) Change path for some files a) nohup100.out b) bam.lst (3) Change the way to submit jobs (SLURM)

11: step8_sync_and_recalibrate_bam.sh

(1) Change the way to submit jobs (SLURM)

12: step9_construct_BAM_recaliberated_per_manifest.sh

(1) change some fixed dir path.

lxwgcool commented 2 years ago

Location of latest source code in biowulf

/home/lix33/lxwg/Git/IlluminaSequencingAnalysis/COVID_2nd_Pipeline/SourceCode

lxwgcool commented 2 years ago

New testing dataset

1: Add testing data BAM file 2: Add 2 different testing cases a) 1 subject 2 samples b) 2 subjects 2 samples