deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
408 stars 118 forks source link

My calculations of the TPMs are different from RSEM outputs #55

Open Jerry001 opened 7 years ago

Jerry001 commented 7 years ago

Hello,

I used the RSEM output gene effective_length and expected_count to compute the TPM, but the results are different from the TPM output from RSEM. Are there any ways for me to correct the discrepancies? Thank you.

freekvh commented 4 years ago

Hi,

I'm on RSEM 1.3.1 and STAR 2.6.1.b.

Sorry for opening this old topic but I experience the same issue. I require more decimals for the TPM values and so I recompute the TPM by dividing expected_count by effective_length, which gets you the reads_per_base. For each gene I divide this value by the sum of all reads_per_base values, then multiply by 1e6 and we have TPM.

For most genes I am within rounding error of the RSEM computed TPM (or at least very close). But some genes deviate quite a lot (up to a factor of 28). From some quick checking, it happens to genes for which the dominant transcript has an effective_length of 0. The genes most affected are (from the Ensembl GTF, 38.95) ENSG00000228430 and ENSG00000275084. I think these are genes with particularly short transcripts, always close to typical fragment lengths, i.e. ENSG00000228430 has transcript ENST00000636401 which has an effective length of 95 and in my case an expected_count of 7.43 (and an effective_length of 0).