RGLab / MAST

Tools and methods for analysis of single cell assay data in R
224 stars 57 forks source link

Unexpected Large Confidence Interval of LogFC #142

Open JiaxinLi-lipluszn opened 3 years ago

JiaxinLi-lipluszn commented 3 years ago

Hi! I'm using MAST to identify sex-biased genes (Differentially Expressed Genes between males and females) in a dataset with both case and control samples.

zlmCond <- zlm(~sex + cngeneson + diagnosis + region + Capbatch + Seqbatch + RNA.Integrity.Number + RNA.mitochondr..percent + RNA.ribosomal.percent + age, 
               sca, method = "bayesglm", ebayes = F, silent =T)

summaryCond <- summary(zlmCond, doLRT = 'sexM')
summaryDt <- summaryCond$datatable

fcH <- merge(summaryDt[contrast=='sexM' & component=='H',.(primerid, Pr(>Chisq))], 
              summaryDt[contrast=='sexM' & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid')

When I did the analysis on the whole dataset, MAST performed expectedly. Because for some marker genes on sex chromosomes like XIST, the logFC that MAST got are just in the correct direction. For example, XIST is a gene on X chromosome only expressed in females, so MAST gets negative logFC with small FDR.

However, When I separate the dataset into 2 datasets (1 only with control samples and another one only with case samples), strange things happened, I performed nearly the same analysis (I eliminated the diagnosis factor in the model because now each group only has one level of diagnosis) on these 2 datasets. But the marker genes showed abnormal logFC and confidence interval.

For example, in the case group, XIST has a large positive logFC which means it's identified as male-biased in the case group. And the length of the confidence interval is large with ci.hi > 0 and ci.lo < 0 . I'm sure this is not because I thought in the wrong direction. Because in the same analysis, Y chromosome genes have positive logFC. And it's also not because of the low quality of data. I plotted the scatter plot of the expression level in each cell. It's obvious that XIST shows expected female-biased expression patterns in both the case group and control group.

I'm stuck here for a long time because I'm not sure how the confidence interval of LogFC is calculated. Any help will be appreciated!

gfinak commented 3 years ago

How do you propose we help you? You haven't posted any data, figures, or any other information about how many cells, how many samples, or the design of your experiment. Not sure what we can do for you unless you can share some data and maybe a reproducible example.

gfinak commented 3 years ago

@JiaxinLi-lipluszn do you have anything actionable we can follow up on? Did you solve your problem, answer your question? Do you have a reproducible example we can work with to answer your question?