MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
316 stars 20 forks source link

Milo not identifying significant results #310

Closed RunyuXia closed 2 months ago

RunyuXia commented 4 months ago

Hi!

Thanks for developing this great tool! We get some interesting results but now we are running into some issues with a particular dataset. The purpose of the comparison is two compare the change in cell type abundance between two parity status (Nulliparious and uniparous). It seems that Milo is unable to pick up significant results despite us observing high log fold changes.

Parous - NP_lymphoid bee_DA_new Parous - NP_lymphoid hist_new Parous - NP_lymphoid p_distribution_new Parous - NP_lymphoid volcano_new

Below is the code I used to generate the analysis.

milo_func_new <- function(adata, contrast, celltype){

set.seed(123)
new_parity <- colData(adata)$Parity != 'NP'
colData(adata)$NewParity <- new_parity

milo <- Milo(adata)
names(assays(milo))=c("counts","logcounts", "raw")
milo <- buildGraph(milo, k = 40, d = 20, reduced.dim = "scvi", transposed = TRUE)
milo <- makeNhoods(milo, prop = 0.5, k = 40, d=20, refined = TRUE, reduced_dims = "scvi", refinement_scheme = "graph")

plotNhoodSizeHist(milo)

milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample="Sample")
design_df <- data.frame(colData(milo))[,c("Sample", "Age", "Parity", "NewParity", "Batch")]
design_df <- distinct(design_df)
rownames(design_df) = design_df$Sample
design_df <- design_df[colnames(nhoodCounts(milo)), , drop=FALSE]

milo.res <- testNhoods(milo, design = ~ NewParity, design.df = design_df, reduced.dim='scvi', fdr.weighting="graph-overlap")

ggplot(milo.res, aes(logFC, -log10(SpatialFDR))) + 
geom_point() +
geom_hline(yintercept = 1)

milo.res$Diff <- sign(milo.res$logFC)
milo.res$Diff[milo.res$SpatialFDR > 0.1] <- 0

milo <- buildNhoodGraph(milo, overlap=5)
milo.res <- annotateNhoods(milo, milo.res, 'CellType_new')
milo.res$CellType <- ifelse(milo.res$CellType_new_fraction < 0.5, "Mixed", milo.res$CellType_new)
milo.res_new = milo.res[which(milo.res$CellType != "Mixed"),]

plotReducedDim(milo, dimred = "umap", text_by = "CellType_new", colour_by="Parity", point_size=1) + guides(fill="none") 

plotNhoodGraphDA(milo, layout="umap", milo_res=milo.res_new, alpha=0.1) 

if (!any(milo.res_new$Diff == 1)){
    plotDAbeeswarm1(milo.res_new, group.by = "CellType")

}

else{
    plotDAbeeswarm(milo.res_new, group.by = "CellType")

}

return (list(milo, milo.res))

}

Thank you so much for your help!

MikeDMorgan commented 4 months ago

Hi @RunyuXia A few observations:

RunyuXia commented 4 months ago

Hi Mike,

Thanks for so much your reply! I have 8 samples in this data set, 4 for each condition.

MikeDMorgan commented 4 months ago

OK, try using a smaller p for makeNhoods.

RunyuXia commented 3 months ago

Hi Mike,

I tried using p = 0.1 and a bunch of different k values but milo still does not seem to be picking up the change unless I use very small k values where the peak of neighbourhood sizes is below 20. Does the peak matter in this case?

MikeDMorgan commented 3 months ago

As per the guidelines in the original Milo paper - the recommendation is to set k such that you have a minimum of Sx5 counts per nhood, where S=total number of samples, so in your case this would be 8*5=40. If you subset your data to just one of the groups of cells, e.g. T cells and re-run the workflow (dim red, graph, etc), do you still see no sig diffs with 10% FDR?

RunyuXia commented 3 months ago

Hi Mike,

Thank you for your reply! I was able to get sig diffs with just one group of cells (T cell as you mentioned) and the peak neighborhood count is around the normal range. I was wondering why Milo only works for smaller datasets in this case?

MikeDMorgan commented 3 months ago

I think in this case the issue is not Milo, but the resolution of your data to identify subtle differences with only 8 samples.

There are 2 things to consider here:

1) Would you expect to have good resolution to identify cell state differences when you have such broad cell types - it may be that you just don't have the necessary level of resolution. Subsetting and redefining the HVGs, reduced dims, etc means that the differences within a cell type will be much more pronounced.

2) Are these data from human, mice or other? If the former, then it is an issue of between-sample heterogeneity, humans are just more variable than inbred lab animals. If it's a model organism (mouse), then you may need to consider the signal-to-noise ratio, and increase your number of replicates.

The fact that subsetting to just T cells revealed some DA nhoods indicates (1) may at least partially hold true. I can't speak for (2).

RunyuXia commented 3 months ago

thank you!!