Open frederick-de-baene opened 2 years ago
Hi Frederick,
when the chimeric output option is used, some of the chimeric reads do not map non-chimerically. They are not counted as mapped or unmapped in the Log.final.out file.
So, if I understand correctly, this data set has 557990 chimeric read pairs which have chimeric segments. Of those 557990 chimeric read pairs, there is a first subset which has both segments mapped (one segment mapped non-chimerically (let's call it seg_1)) and the other mapped chimerically (let's call it seg_2)), a second subset for which only one segment was mapped non-chimerically (seg_1), and a third subset for which only one segment mapped chimerically (supplementary alignment) (seg_2).
In post #282, you mentioned the following: << Chimeric alignments are counted separately - some of them are reported as mapped (if non-chimeric alignment passes the mapped filter), and some as unmapped (otherwise). >>
But, putting this together, it seems that chimeric read pairs, which are counted separately and for which the total no. of read pairs can be found in the log file (i.e., 557990), are either reported as mapped (both seg_1 and seg_2 are mapped), unmapped (seg_1 is mapped but seg_2 not), or neither (neither seg_1 and seg_2 may be or be not mapped).
So, in the example above, there are 155186 chimeric read pairs which are not reported as mapped nor unmapped because they do not have their seg_1 mapped.
As a follow-up question, why are these 155186 read pairs not just reported as unmapped?
This is correct. The reads that do not map normally, but map chimerically, are not reported as unmapped.
Okay, thank you for the clarification. And what is the reason for not reporting them neither as mapped nor unmapped?
The reads that map chimerically are not considered unmapped because --chimOutType WithinBAM
forces their output to the BAM file.
Hi
We have used Star (version 2.7.9a) to map paired-end reads, and obtained the following Log.final.out file:
Looking into more detail, it seems the number of reads do not add up correctly. According to post #282, the number of input reads is calculated as follows:
total input read = uniquely mapped reads + mapped to multiple loci + mapped to too many loci + too many mismatches + too short + other
However, when applying this formula to the above log file:
uniquely mapped reads + mapped to multiple loci + mapped to too many loci + too many mismatches + too short + other = 41934056 + 5381462 + 641111 + 0 + 1621558 + 545954 = 50124141
Which is 155186 less than the number of input reads according to the log file (50279327).
The command used to obtain this log file:
What could explain this discrepancy?
Thank you for having a look.
BR