NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

CustomizedQC: replace the old logic to reorganize the raw COVID data #3

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago
  1. Cluster raw Covid Data by flowcell
  2. Put all samples belongs to same flowcell together
  3. Using the same folder structure as CGR inhouse data
  4. Using the same naming rule to re-name each raw sample
  5. Rewrite main code of pipeline to handle this new data structure and multiple different flowcells at the same time
lxwgcool commented 3 years ago

Add a new code "DataReconstruct.py":

  1. generate flowcells from raw COVID fastq file
  2. Extract related info from the content of each fastq file
lxwgcool commented 3 years ago

Testing case (Passed)

  1. Working Dir
    • /home/lix33/lxwg/Git/IlluminaSequencingAnalysis
  2. Source code
    • /home/lix33/lxwg/Git/IlluminaSequencingAnalysis/CustomizedQC/SourceCode
  3. Testing Data
    • /home/lix33/lxwg/Data/ad-hoc/CustomizedQC/Covid19/pI3.Covnet.00
    • Copy from one small miseq flowcell (CGR inhouse data)
  4. Testing Results
    • /home/lix33/lxwg/Data/ad-hoc/CustomizedQC/Covid19/Output/ProcessedData
  5. Reference Info
    • /scratch/lix33/DCEG/CGF/Bioinformatics/Production/data/ref38
    • Copy from:
      • /DCEG/CGF/Bioinformatics/Production/data/hg38
  6. How to run testing code:

    • Step 1
1: Reconstruct data 
python3 /home/lix33/lxwg/Git/IlluminaSequencingAnalysis/CustomizedQC/SourceCode/DataReconstruct.py /home/lix33/lxwg/Data/ad-hoc/CustomizedQC/Covid19 /home/lix33/lxwg/Data/ad-hoc/CustomizedQC/Covid19/Output/ProcessedData
python3 /home/lix33/lxwg/Git/IlluminaSequencingAnalysis/CustomizedQC/SourceCode/CustomizedQC.py /home/lix33/lxwg/Data/ad-hoc/CustomizedQC/Covid19/Output/ProcessedData