alexdobin / STAR

RNA-seq aligner
MIT License
1.82k stars 501 forks source link

difference of result samtools flagstat and Log file #1447

Open zhangqc723 opened 2 years ago

zhangqc723 commented 2 years ago

Now, I got bam file used STAR 2.7.9a from paired fastq, and I use samtools flagstat for getting the information of BAM file (as below). Why total reads(5249979) does not equal with input reads from .Log file(as below).


8928386 + 0 in total (QC-passed reads + QC-failed reads) 8480267 + 0 primary 448119 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 8928386 + 0 mapped (100.00% : N/A) 8480267 + 0 primary mapped (100.00% : N/A) 8480267 + 0 paired in sequencing 4245534 + 0 read1 4234733 + 0 read2 8328914 + 0 properly paired (98.22% : N/A) 8328914 + 0 with itself and mate mapped 151353 + 0 singletons (1.78% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)


                            Started job on |    Dec 13 16:31:46
                         Started mapping on |   Dec 13 16:32:46
                                Finished on |   Dec 13 16:38:50
   Mapping speed, Million of reads per hour |   51.92

                      Number of input reads |   5249979
                  Average input read length |   292
                                UNIQUE READS:
               Uniquely mapped reads number |   4171608
                    Uniquely mapped reads % |   79.46%
                      Average mapped length |   283.67
                   Number of splices: Total |   4373643
        Number of splices: Annotated (sjdb) |   4219256
                   Number of splices: GT/AG |   4311681
                   Number of splices: GC/AG |   45758
                   Number of splices: AT/AC |   6182
           Number of splices: Non-canonical |   10022
                  Mismatch rate per base, % |   0.25%
                     Deletion rate per base |   0.01%
                    Deletion average length |   3.31
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.36
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   144202
         % of reads mapped to multiple loci |   2.75%
    Number of reads mapped to too many loci |   11348
         % of reads mapped to too many loci |   0.22%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 919132 % of reads unmapped: too short | 17.51% Number of reads unmapped: other | 3689 % of reads unmapped: other | 0.07% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%


alexdobin commented 2 years ago

Hi @zhangqc723

samtools counts the number of alignments in the BAM file. STAR's "Number of input reads" is the number of reads in the FASTQ files. Two mates (read1,read2) are counted as one read by STAR.

Cheers Alex

zhangqc723 commented 2 years ago

Hi @zhangqc723

samtools counts the number of alignments in the BAM file. STAR's "Number of input reads" is the number of reads in the FASTQ files. Two mates (read1,read2) are counted as one read by STAR.

Cheers Alex

Hi Alex Thanks for your reply.But, the result "Number of input reads" (5249979) multiplied by 2 is more than "8928386 + 0 in tota" from Samtools. This meas that STAR can not output all input?. If that is right, which type of reads is filtered when STAR's parameter is default?

alexdobin commented 2 years ago

Hi @zhangqc723

STAR input all reads from FASTQ, but the SAM output (with default parameters) contains only mapped reads, so SAM has fewer reads.

Cheers Alex

zhangqc723 commented 2 years ago

Hi @zhangqc723

STAR input all reads from FASTQ, but the SAM output (with default parameters) contains only mapped reads, so SAM has fewer reads.

Cheers Alex

I know.Thank you for your help and patience.