alexdobin / STAR

RNA-seq aligner
MIT License
1.78k stars 497 forks source link

Difference in unmapped reads between log files #1006

Open sarde279 opened 3 years ago

sarde279 commented 3 years ago

Dear Alex,

I am posting the info from log files I got post running mapping with STAR. I am surprised by the difference in number of reads reported between ReadsPerGene.out.tab file and Log.final.out file.

Can I please know where does the number of reads unmapped (N_unmapped) in ReadsPerGene.out.tab file come from? It doesn't match with the number of reads unmapped in Log.final.out file.

ReadsPerGene.out.tab

N_unmapped 11563250 11563250 11563250 N_multimapping 3381699 3381699 3381699 N_noFeature 14966 16645 15915 N_ambiguous 326 131 152 Soltu.DM.S001600.1 0 0 0 Soltu.DM.S001610.1 0 0 0 Soltu.DM.S000630.1 0 0 0 Soltu.DM.S000650.1 0 0 0

Log.final.out

Started job on | Aug 07 13:55:05 Started mapping on | Aug 07 13:55:16 Finished on | Aug 07 13:59:00 Mapping speed, Million of reads per hour | 187.97

                      Number of input reads |   11695886
                  Average input read length |   232
                                UNIQUE READS:
               Uniquely mapped reads number |   17643
                    Uniquely mapped reads % |   0.15%
                      Average mapped length |   100.83
                   Number of splices: Total |   773
        Number of splices: Annotated (sjdb) |   108
                   Number of splices: GT/AG |   136
                   Number of splices: GC/AG |   23
                   Number of splices: AT/AC |   380
           Number of splices: Non-canonical |   234
                  Mismatch rate per base, % |   1.96%
                     Deletion rate per base |   0.02%
                    Deletion average length |   1.45
                    Insertion rate per base |   0.04%
                   Insertion average length |   1.15
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   3381699
         % of reads mapped to multiple loci |   28.91%
    Number of reads mapped to too many loci |   8260843
         % of reads mapped to too many loci |   70.63%
                              UNMAPPED READS:
   Number of reads unmapped: too many mismatches |  0
   % of reads unmapped: too many mismatches |   0.00%
        Number of reads unmapped: too short |   0
             % of reads unmapped: too short |   0.00%
            Number of reads unmapped: other |   35701
                 % of reads unmapped: other |   0.31%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%
alexdobin commented 3 years ago

Hi Sandeep,

the GeneCounts output is equivalent to that of htseq-count, and so it has slightly different meaning. The reads with only one aligned mate are not counted and added into N_unmapped. Reads that mapped to multiple loci are also added to N_unmapped. Only uniquely mapped reads are counted towards the genes.

Cheers Alex