deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
403 stars 118 forks source link

cnt_file total number of reads #131

Closed qoiopipq closed 4 years ago

qoiopipq commented 4 years ago

I ran rsem-calculate-expression with --star and --paired-end options. In cnt_file output: 886565 19304581 0 20191146 19200087 104494 12410933 58758347 3 Total number of reads is:20191146. However, I know that number of reads in the FASTQ files using rsem_calculate-expression command is 76,100,339. It seems to me that the cnt_file total number is a lot less than the number of FASTQ input reads? Shouldn't be the same or is it because STAR soft clipping?

Cheers!

Rohit-Satyam commented 4 years ago

Hi @qoiopipq

I faced the same problem. I didn't find the .cnt file much informative as well. What you can try is the following options:

rsem-calculate-expression --output-genome-bam --strandedness reverse -p 32 --calc-pme --calc-ci --keep-intermediate-files --append-names --sort-bam-by-coordinate --paired-end --star --star-path

use -keep-intermediate-files and check the Log.final.out for stats. These are generated by the star itself. Hope you will get accurate stats from there

qoiopipq commented 4 years ago

Hi @Rohit-Satyam

It worked. Thanks!

bli25 commented 4 years ago

@qoiopipq , the reason is that for the transcript bam, STAR only outputs reads that are aligned to the genome and thus the total number of reads is much smaller. If you use RSEM v1.3.3 instead, it will generate a sample_name.log file, which records the alignment statistics you need.