aet21 / EpiSCORE

Epigenetic cell-type deconvolution from Single-Cell Omic Reference profiles
28 stars 10 forks source link

Microglia estimate from bulk brain tissue #10

Open neuromik opened 1 year ago

neuromik commented 1 year ago

I have been using the package to estimate cell proportions in bulk brain tissues and I noticed that I consistently get 0s for microglia in multiple different datasets. I also performed estimate on sorted microglia, and the prediction was only 50% average while you would expect 90+. Are you aware of any limitations in the algorithm that could consistently underestimate microglia proportions or any suggestions to troubleshoot it? Thank you!

aet21 commented 1 year ago

Hi Tatiana,

thanks for your interest and feedback, which is exactly what we want.

When we assessed the BrainReference in the snmC-Seq data we did not

see a particular problem with the microglia component. In the DNAm reference

matrix there are a total of 15 markers and their weights are not lower than those for

the OPC or astrocyte component, so this does not seem to be the reason for low microglia

fractions you are observing. If you use a threshold on the weight (w>0.4) then you have 10 markers

for microglia left, but only 8 for astrocytes and only 4 for OPCs. These are not big numbers,

and highlights some of the challenges. That the microglia fraction should be small makes sense,

but i agree that if the fractions are consistently zero across many datasets that this indicates a skew/bias,

which means that you need to interpret these fractions in a relative sense, not an absolute one. Indeed,

based on our analyses, we often interpret the fractions more on a relative scale, which is better than nothing.

Bear in mind that these fractions are often used later as covariates, and multivariate statistics derived from the

models do not depend on the scale of the covariates, although non-linear distortions due to fractions being defined

on a (0,1) scale could slightly affect statistical inferences.

Another potential explanations: the Infinium DNAm assay can skew cell-type proportions itself, for instance, we have very

strong evidence for this in other tissues, so whilst the microglia proportion should be higher, the assay itself selects against

these cell-types and so you measure a lower proportion. Also, claims of >90% purity are doubtful, as scRNA-Seq studies have

shown that highly cell-type specific markers generally do not exist, so maybe your samples are only 70% pure, which in conjunction with a 20% error in EpiSCORE (because of small marker numbers) could easily explain your 50% proportions.

You can also try rerunning EpiSCORE with more iterations or with no threshold on the weights, to see if that makes a difference.

Or, you could try to rebuild a DNAm reference matrix using a newer and better scRNA-Seq atlas, identifying more marker genes, and then using the EpiSCORE imputation procedure which may result in a DNAm reference matrix with more markers.......This is exactly why in our EpiSCORE R-package we provide all the R-functions to do this. Our tutorial also explains how to build new DNAm reference matrices.

In summary, I think the fractions you have obtained should be interpreted more as a relative fraction, not an absolute one, but there is probably also a case here to rederive a newer brain DNAm reference matrix that has more markers per cell-type.

hope this helps,

A.