RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

Units of logFC for a numeric dependent variable #140

Closed combiz closed 4 years ago

combiz commented 4 years ago

Could you kindly advise on how to interpret the glmer logFC values for models with a numeric dependent variable? The units are difficult to infer.

amcdavid commented 4 years ago

It's like any other numeric variable in a regression. A one unit change in dependent x causes a 2^(logFC)-fold change in expression, holding all other coefficients in the model constant.

combiz commented 4 years ago

Thanks, this is what we expected until we saw that the magnitude of logFC for some genes was difficult to corroborate visually. For example, this gene was fit with glmer (~hist + (1 | individual) + cngeneson + pc_mito) and was identified as DE (padj 1.72e-122 and logFC 0.89) (the points are the median for each sample and the blue line a basic glm fit): - image image

amcdavid commented 4 years ago

The paradoxical sign of the log fold change may because it is extrapolating beyond the range of the data of pc_mito. Best to center pc_mito. The "holding everything else" constant is important for the fold change calculations. From the documentation (?logFC):

The log-fold change is defined as follows. For each gene, let u(x) be the expected value of the continuous component, given a covariate x and the estimated coefficients coefC, ie, u(x)= crossprod(x, coefC). Likewise, Let v(x)= 1/(1+exp(-crossprod(coefD, x))) be the expected value of the discrete component. The log fold change from contrast0 to contrast1 is defined as

u(contrast1)v(contrast1)-u(contrast0)v(contrast0).

If you don't center pc_mito then the log fold change (by default) is going to be u(Intercept, Hist = 1, pc_mito = 0, cngeneson = 0)v(Intercept, Hist = 1, pc_mito = 0, cngeneson = 0) - u(Intercept, Hist = 0, pc_mito = 0, cngeneson = 0)v(Intercept, Hist = 0, pc_mito = 0, cngeneson = 0)

The random effect is probably playing a role here, too, since hist appears to be a function of individual.

combiz commented 4 years ago

Thanks for the pointers. We have indeed centred and scaled pc_mito as per the vignette recommendation for glmer. Your point about the random effect for this model is interesting; hist is indeed highly correlated with individual according to the CCA: - image i.e. other than control samples with the same hist (histology marker) of 0, it's true that for the remaining samples, hist is a function of individual. This is unlike our typical categorical dependent variable model (e.g. ~diagnosis+(1|individual)+cngeneson+pc_mito) and perhaps argues for dropping the random effect for individual in this case. This would also necessitate switching method from glmer to glm (or bayesglm) for these models. Will give it a try!

combiz commented 4 years ago

Very pleased to confirm that everything has fallen into place after dropping the (1 | individual) term and switching from glmer to glm. The logfc values now make sense and are consistently corroborated by the scatter/jitter plots. Thanks for your comments Andrew.