NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

H-pylori: Generate BAM for 1000+ samples by using 4 different reference #57

Open lxwgcool opened 1 year ago

lxwgcool commented 1 year ago

Three different purposes

lxwgcool commented 1 year ago

Working Directory

Contains 3 different types of BAM, including Raw, Mapped (Unique) and Unmapped

Raw BAM /scratch/lix33/lxwg/Project/H_pylori/Processed/BAM/*/Raw

Mapped BAM (Unique) /scratch/lix33/lxwg/Project/H_pylori/Processed/BAM/*/Mapped

UnMapped BAM /scratch/lix33/lxwg/Project/H_pylori/Processed/BAM/*/UnMapped


* Variant Calling results

/scratch/lix33/lxwg/Project/H_pylori/Processed/CSV


* Log

/scratch/lix33/lxwg/Project/H_pylori/Processed/Log (1) The calling command line and related history are recorded here

lxwgcool commented 1 year ago

The method of getting unique mapped BAM

lxwgcool commented 1 year ago

The method of getting unmapped reads

lxwgcool commented 1 year ago

Output Explain

image

image

lxwgcool commented 1 year ago

New Features and Fixed Bugs

New Features

Fixed bug

lxwgcool commented 1 year ago

New Function: find disordered Gene

1: Easy case
(1) Normal, discored (1), normal

2: Midean Level case
(1) front goes to back (304)
    * still pick it out)

3: Hard level case
(1) Number of reads from back to front
    * not report all
    * only find if there is discord in these number of reads
(2) order goes to reverse order 
    * just ignore it
(3) number of reads together but the order arrangment is pretty complex.
(4) the case of Chimeric
lxwgcool commented 1 year ago

Big Progress: new function of calculating Methylation Position

  1. Use The raw position in gene fna file and the position in gffv3 file to collect all methylations that belong to current gene
  2. Align Gene back to reference
  3. Find the real mapping position of the methylation.
Hey Difei,

Everything is all set.

The results have been uploaded into: 
/scratch/lix33/lxwg/Project/H_pylori/Processed/Methylation

Please check the files below:

The related github issue is: 
https://github.com/NCI-CGR/IlluminaSequencingAnalysis/issues/57

The latest commit is: 
https://github.com/NCI-CGR/IlluminaSequencingAnalysis/commit/f49dc3cf0531e8477567029643b34bcd2a9d0d08

The column “LiftoverAlignedPosRef(Methylation)” in CSV file is what you are looking for.

I am happy this job could be done in 2022.

Have a great new year
Best
Xin
lxwgcool commented 1 year ago

Reference

1: Explain CIGAR Mask https://samtools.sourceforge.net/samtools/bam/PDefines/PDefines.html

2: How to do offset for different CIGAR code https://sourceforge.net/p/samtools/mailman/message/29373646/

lxwgcool commented 1 year ago

New function: Add three more info into the CSV file

lxwgcool commented 1 year ago

Modify the logic: Add mask to orphan Methylation