RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

LFC interpretation #167

Closed rajewski closed 2 years ago

rajewski commented 2 years ago

Thanks for providing such a great tool. I have some questions about interpreting the outputs of summary() called on a ZlmFit object with the doLRT option. The output includes coefficients along with their confidence intervals and also a Z-score.

When I look at the estimates of the log2 fold change for the hurdle component, they seem suspiciously low for my data. As a sanity check, I ran Seurat's FindMarkers(), and I understand the tests being conducted are completely different, but the L2FC from Seurat are orders of magnitude higher than the output of zlm(). I am left wondering if the zlm L2FC represents the log2 fold change of expression between the level specified with doLRT= and the intercept, or if, like the LRT of DESeq2, these values represent the L2FC of the MLE between the full and reduced models and aren't really related to the biological hypothesis being tested.

If the latter is true, then is there a good way to extract a L2FC of expression? I think the ability to specify a more complex model in MAST is a huge draw, but my clients often want to say that a given gene is X fold higher expressed in Condition B compared to a baseline Condition A, but I am struggling to provide them with an interpretable result like that from MAST.

amcdavid commented 2 years ago

There are a number of caveats with interpreting logFC, especially when you have covariates in the model. In particular, the covariates need to be set to "typical" values in the contrasts of interest for it to resemble what FindAllMarkers would report, similar deriving a total effect in a mediator model. The logFC help explains what it is estimating -- let me know if something is unclear and we can update the docs.