deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
408 stars 118 forks source link

rsem-plot-model result issue #35

Closed user611 closed 7 years ago

user611 commented 7 years ago

Hi,

I used the rsem-plot-model to produce some plot after mapping my RNASeq data. My library has 62,438,506 reads (paired-end) in total. The pie chart - "Alignment statistics" - shows I have ~36% unique, 35% multiple. However, the numbers from STAR program (I kept the temp files generated during RSEM mapping) shows 64% unique and 16% multiple. Which are far different from each other. On the other hand in the sample.cnt file, I found the following numbers: 12591831, 31048910, 0, 43640741 - supposed to corresponding to multiple, unique, filtered, total. If the rsem-plot-model used these numbers for plotting, I don't understand how the percentages were calculated? If it used numbers from other filesto plot, which files it used? Thanks!

bli25wisc commented 7 years ago

Hi @user611 , the 4 numbers in the first line refer to number of unalignable reads, number of alignable reads, number of filtered reads and number of total reads. You can find the cnt file description from cnt_file_description.txt.

For the pie chart, RSEM reports isoform-level statistics. But STAR reports genome-level statistics. Since a unique read aligned to the genome can be mapped to multiple isoforms, I am not surprised to see that RSEM and STAR reported different numbers.

I'll make this point clear in future RSEM releases.

Thanks, Bo