Closed cb4github closed 4 years ago
Hi, it's not quite clear how this issue is related to RNA-SeQC. If you're asking about differences between RNA-SeQC gene-level expression and RSEM transcript-level estimates in GTEx, please contact the GTEx Portal.
From separate email, it was explained that the RNA-SeQC TPM estimates are based on a simple normalization by gene length, whereas RSEM attempts to correct for additional biases in read coverage. These two aspects can result in relatively large differences for some genes. Many thanks!
Dear Folks,
I hope all is well, and thanks for all your efforts.
In the transcript expression file, GTEx_Analysis_2017-06-05_v8_RSEMv1.3.0_transcript_tpm.gct.gz, there is only one transcript, namely ENST00000367976.3, for ENSEMBL gene ENSG00000118523.5 (a.k.a. CTGF), and when I extract the TPM values for said transcript and tissue type 'Artery - Aorta', the resulting median TPM is 935.8 for n=432 (non-zero values).
Correspondingly, in the gene expression file, GTEx_Analysis_2017-06-05_v8_RSEMv1.3.0_transcript_tpm.gct.gz, when I extract the TPM values for said gene and tissue type 'Artery - Aorta', the resulting median TPM is 2043 for n=432 (non-zero values).
Also, please see the attached - and apparently quite similar - violin plots (grouped by donor's age bracket) for the gene and transcript TPM values, respectively.
I've looked at the code briefly, and please excuse that I have yet to explain the this difference (by a factor of ~2.2) of median TPM 2043 for the gene CTGF from 935.8 for the singleton transcript ENST00000367976.3.
Please advise, thanks. Best, CB Rplot.CTGF.ArteryAorta.22_10_20.pdf Rplot.ENST00000367976.ArteryAorta.12_10_20.pdf