NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

Customized QC: new logic to merge sample first before do alignment #13

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago

Since multiple sample from different lanes should be merged together, we need to have the logic in pipeline to do this job simultaneously.

lxwgcool commented 3 years ago

This new logic is associated with a lot of modification in the code. Please check the details below

For Reference Data

Moved the reference data from scratch to "/data/COVID_WGS/lix33/DCEG"

  1. It is required for all alignment jobs
  2. The total size is around 80GB

CustomizedQC.py

  1. Add two flags for the logic of Merge Sample
    • flag.merge.sample.working
    • flag.merge.sample.done
      1. Add a special flag to let people know if current sample contains fastq files from multiple lanes
    • flag.multi.sample
      1. Create a specific function "UpdateFastqInfo" in ClsSample to reset fastq file to current sample after sample merging been finished
    • For a flowcell, some of sample may need to be merged, while others do not need (will be deleted after alignment).
    • For the merged sample we save the physical file in customized folder, otherwise, we use softlink directly.
      1. Add a new logic in CheckingFlags (the flag of MergeSample)
      2. New Function "SubmitMergeSampleJob" to submit SampleMerge jobs for each sample
      3. Add the new logic to handle the flowcell level SmapleMerging flag
      4. Improve the old of logic of checking flowcell level alignment and QC report working/done flag

reference_mapping_single.sh

1: deliver a new argument "iMergedSample" to determine if the fastq file need to deleted automatically after mapping

New source code: MergeSampleSingle.py

1: Do the job of merging sample 2: Handle the sample level merging sample flags.