My calculations of the TPMs are different from RSEM outputs

Hi,

I'm on RSEM 1.3.1 and STAR 2.6.1.b.

Sorry for opening this old topic but I experience the same issue. I require more decimals for the TPM values and so I recompute the TPM by dividing expected_count by effective_length, which gets you the reads_per_base. For each gene I divide this value by the sum of all reads_per_base values, then multiply by 1e6 and we have TPM.

For most genes I am within rounding error of the RSEM computed TPM (or at least very close). But some genes deviate quite a lot (up to a factor of 28). From some quick checking, it happens to genes for which the dominant transcript has an effective_length of 0. The genes most affected are (from the Ensembl GTF, 38.95) ENSG00000228430 and ENSG00000275084. I think these are genes with particularly short transcripts, always close to typical fragment lengths, i.e. ENSG00000228430 has transcript ENST00000636401 which has an effective length of 95 and in my case an expected_count of 7.43 (and an effective_length of 0).

deweylab / RSEM

My calculations of the TPMs are different from RSEM outputs #55