Can we used normalised counts as input?

Hello,

Thank you for your question. I'm not sure to which "normalised" counts you are referring as there are multiple ways of normalizing them. But here is some general answer relating to what type of counts to use for EPIC:

EPIC (and the maths behind it) has been developed based on TPM normalization, so I would advise using TPM, if possible (FPKM/RPKM would work as well). In particular, the part of EPIC that transforms between the predicted mRNA fractions and predicted cell fractions works if the data is TPM normalized (because with this normalization a count of 1 for a given gene will be proportional to the number of copies of this mRNA in the sample, while when using other normalizations this would depend on the size of the given gene).

However, if it is sufficient to estimate the mRNA fractions (instead of cell fractions), then other normalizations should likely also work. But, please note that if you want to use another normalization, you then should ideally redefine the reference gene expression profiles, so that both the bulk samples and the reference profiles are based on the same normalization. If you don’t use the same normalization for bulk and reference, this would likely lead to biases in the estimated proportions. Here’s a little example explaining the problem: let’s imagine that for the reference gene expression profile of B cells, geneA has a TPM of 1 and geneB also 1; but that, based on the same data, another normalization gives values of 1 for geneA and 10 for geneB. Then, if you’d like to estimate the fraction of this pure B cell sample, giving as input to EPIC this other normalization value, and using the standard TPM values as reference, EPIC should in principle return that B cells are composing the sample at 100%, but it wouldn’t really be able to know this, maybe it would tell it is only 50% B cells, because it wouldn’t be able to make that the values for geneA and geneB fit very well the reference profile values at the same time. Note that this “issue” is not only present in EPIC, but any other deconvolution method based on gene expression reference profiles would have the same problem.

So to summarize: EPIC should work to predict the proportion of mRNA based on other normalization (with no warranty, I didn’t test it based on all possible normalizations), but you’d better redefine the reference profiles (you could build them based on the same data than for EPIC, the datasets used are publicly available and referenced in our publication as well as in the R package help documents (e.g. ?EPIC::TRef ; ?EPIC::BRef )).

Best wishes,

Julien

GfellerLab / EPIC

Can we used normalised counts as input? #9