KaiAragaki / tidyestimate

ESTIMATE tumor infiltration, the tidy way
GNU General Public License v2.0
12 stars 1 forks source link

Expression data input units #10

Closed dBenedek closed 2 years ago

dBenedek commented 2 years ago

Hello,

What is the ideal input expression unit? TPM or some other normalized counts?

Best, Benedek Danko

KaiAragaki commented 2 years ago

Good question.

This method that the ESTIMATE team implemented is very similar to the ssGSEA method. As noted here,

You can find a brief description on how this ssGSEA implementation works in the details of ?estimate_score:

Enrichment scores for each sample are calculated using an implementation of single sample Gene Set Enrichment Analysis (ssGSEA). Briefly, expression is ranked on a per-sample basis, and the density and distribution of gene signature 'hits' is determined. An enrichment of hits at the top of the expression ranking confers a positive score, while an enrichment of hits at the bottom of the expression ranking confers a negative score.

So it seems to me like any kind of 'scaling' normalization like library size correction or log-transformation won't have an effect on score. However, TPM likely will (and it may be wise to normalize as such, as noted here:

ssGSEA is performed based on a ranked gene list (within sample comparisons) therefore it makes sense to use something that accounts for gene length bias in this case such as TPM or FPKM instead of normalized counts.

Hope this helps!