NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

CustomizedQC: still have fastq file for a sample distributed in different lanes (more than 1) #21

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago

Issue

Some fastq files from same sample still be put into different lanes

Analysis

  1. Using the first 5 reads to check the common barcode is not enough, since:
    • the number of minority barcode may larger than majority barcode
    • the first number of reads may contain many minority barcode

Solution

  1. Collect the reads from 100 to 201 (rather than at the beginning)
  2. Total check 101 reads
  3. Use the majority barcode as the final barcode of these sample
lxwgcool commented 3 years ago

There are 4 flowcells has this issue

  1. 202107_XXXXXX_XXXX_H53CWDSX2
    • ./202107_XXXXXX_XXXX_H53CWDSX2/CASAVA/L004
    • This flowcell never been analyzed
    • Merged Lane4 back to lane1
    • Rerun "DataReconstruct.py" to reconstruct this flowcell
  2. ./202107_XXXXXX_XXXX_H3NJ2DSX2/CASAVA/L003 (report is running)
    • I3-98014_AGGCCGAG-GGCATTCT_L003
    • I3-98014_AGGCAGAG-GGCATTCT_L001
  3. ./202107_XXXXXX_XXXX_H3NNNDSX2/CASAVA/L003 (Done)
    • I3-98110_AGGCAGAG-GGCATTCT_L003
    • I3-98110_AGGCCGAG-GGCATTCT_L001
  4. ./202107_XXXXXX_XXXX_H3NJJDSX2/CASAVA/L004 (Done)
    • I3-97778_TCTCTACT-GACCCGCG_L004
lxwgcool commented 3 years ago

How to re-run the existing analyzing / Analyzed flowcell

1: cancel two jobs related to

2: Move all softlink from I3-98014_AGGCCGAG-GGCATTCT_L003 to I3-98014_AGGCAGAG-GGCATTCT_L001

3: Delete sample I3-98014_AGGCCGAG-GGCATTCT_L003

4: Remove all sample level flags for I3-98014_AGGCAGAG-GGCATTCT_L001

5: Remove all flowcell level flag (all of them!)

6: Delete both existing BAM files associated with "I3-98014_AGGCCGAG-GGCATTCT_L003" and "I3-98014_AGGCAGAG-GGCATTCT_L001"

7: Rerun the code

lxwgcool commented 3 years ago

Increase the simultaneously job from 4 to 5.