RitchieLabIGH / IRFinder

MIT License
13 stars 10 forks source link

majority of reads skipped in Long mode #44

Closed Perugolate closed 1 month ago

Perugolate commented 2 months ago

Total reads: 11348918 Skipped reads: 11317525

see also #29 for another LR user where majority of reads are skipped

IRFinder runmode: Long
IRFinder user@host:  @ bc3a0080316a
IRFinder working dir:  /var/lib/docker/overlay2/7695324ce1d65a20bed54b98789bf976198187557abcf0b99ebb97a81b45287f/work
IRFinder reference: REF/gencode
IRFinder file 1: data/A375_Ctrl_Bio1_LR_full_length.fastq
---
[  Fri Sep 6 13:58:45 UTC 2024  ] Minimap2 is starting with 24 threads
---
[  Fri Sep 6 14:27:09 UTC 2024  ] Minimap2 mapping completed
---
[  Fri Sep 6 14:27:09 UTC 2024  ] Processing the BAM file with IRFinder
---
IRFinder run with options:
 - Output Dir:              A375_Ctrl_Bio1
 - Main intron ref.:        REF/gencode/IRFinder/ref-cover.bed
 - Splice junction ref.:    REF/gencode/IRFinder/ref-sj.ref
 - Read spans ref.:         REF/gencode/IRFinder/ref-read-continues.ref
 - Optional ROI ref.:       REF/gencode/IRFinder/ref-ROI.bed
 - Read type:               LR
 - Jitter:                  3
 - Input BAM:               A375_Ctrl_Bio1/Unsorted.bam

Preparing the reference:
 - Junction count...done.
 - Span points...done.
 - Coverage blocks...done.
 - ROI...done

Processing the BAM
Total reads processed: 11348918
Total nucleotides: 8279089339
Total singles processed: 11348919
Total pairs processed: 0
Short pairs: 0
Intersect pairs: 0
Long pairs: 0
Skipped reads: 11317525
 - flag 4: 87539
 - flag 256: 5882548
 - flag 272: 5317998
 - flag 2048: 13582
 - flag 2064: 15858
Error reads: 0
Directionality: Dir evidence:   76326
Directionality: Nondir evidence:    13
Directionality: Dir evidence known junctions:   53798
Directionality: Nondir evidence known junctions:    12
Directionality: Dir matches ref:    53797
Directionality: Dir opposed to ref: 1
Directionality: Dir score all (0-10000):    9998
Directionality: Dir score known junctions (0-10000):    9997
RNA-Seq directionality -1/0/+1: 1
---
[  Fri Sep 6 14:33:38 UTC 2024  ] IRFinder BAM analysis completed
---
---
[  Fri Sep 6 14:33:39 UTC 2024  ] Sorting the bam file
---
[  Fri Sep 6 14:34:19 UTC 2024  ] Indexing the sorted bam file
---
[  Fri Sep 6 14:34:26 UTC 2024  ] IRFinder Long completed.
---
---
[  Fri Sep 6 15:13:14 UTC 2024  ] Minimap2 mapping completed
---
CloXD commented 1 month ago

Dear @Perugolate , Sorry for the late answer. I saw you flagged the previous issue with minimap not being detected as solved, can I ask how did you solve the problem there? For what concerns the skipped reads, the reason is given by the flags. You can find the explanation of each flag in the bam documentation or in this nice web tool provided by picard:https://broadinstitute.github.io/picard/explain-flags.html

Most of the skipped are from secondary alignment, this is to avoid considering suboptimal mapping as biological signal ( only the best mapping is considered ). In the bam file, the multi mapping reads are duplicated ( one entry for each map, not for each read), so your total number of biological reads should be 11348918 + unmapped. Let me know if this clarifies your issue. Best regards, Claudio

Perugolate commented 1 month ago

thanks very much, should have realised this myself