nhood size distribution: some neighbourhoods have over 2000 cells

YiweiNiu commented 1 month ago

Hi,

Thanks for this cool tool! I have about 200k cells from 40 samples and the data was integrated with Harmony. I'm using miloR to do the cell abundance test (code below)

milo.obj <- buildGraph(milo.obj, k = 30, d = ncomps, reduced.dim = "Harmony")
milo.obj <- makeNhoods(milo.obj, prop=0.2, k=30, d=ncomps, reduced_dims = "Harmony",
                       refined=TRUE, refinement_scheme="graph")

Here is the distribution of the nhood size. The mean and median values are 183.425 and 120 respectively.

I was wondering if the distribution is okay to proceed. If not, could you please give me some suggestions about this?

Thanks so much!

MikeDMorgan commented 1 month ago

Please refer to the guidance in the vignettes and supplementary materials of the original Milo paper: https://www.bioconductor.org/packages/release/bioc/vignettes/miloR/inst/doc/milo_gastrulation.html & https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-021-01033-z/MediaObjects/41587_2021_1033_MOESM1_ESM.pdf

YiweiNiu commented 1 month ago

Hi, thanks for the quick reply. Since I have a sample size of ~40, and the mean nhood size is ~180, I guess it's okay to go? Sorry, my concern is that there are many neighborhoods with too many cells and I didn't see that in your paper/tutorial.

MikeDMorgan commented 1 month ago

I would be more concerned with (a) nhoods with too few counts (you have a very large peak in your histogram at some unknown small number) and (b) nhoods with many zero counts in many samples. Assuming you are using Milo2.0, you can check the latter using the checkSeparation function against your model variables - this will highlight any variables that perfectly separate nhoods with <= threshold vs. > threshold counts, where threshold=0 separates exactly zero from non-zero counts.

YiweiNiu commented 1 month ago

Hi, thank you very much for your informative reply.

(a) nhoods with too few counts (you have a very large peak in your histogram at some unknown small number)

I zoomed in on the nhood size distribution plot and it peaked at 40-ish. The range of the nhood size ranged from 31 to 2340. As I used k=30 in buildGraph() and makeNhoods(), I guess it's okay, no?

(b) nhoods with many zero counts in many samples

Here I have ~10 time points (with 3-4 samples per time point) and I also included "Sex" as a covariate in the test. I checked the "count distributions for each nhood according to a test variable of interest". But I need your further help in interpreting this: sorry I don't understand the exact meaning of this and why the number of "TRUE" increased when using higher min.val.

> table(checkSeparation(milo.obj, design.df=milo.design, condition="time", min.val=1))

FALSE  TRUE 
 8066  6401 

> table(checkSeparation(milo.obj, design.df=milo.design, condition="time", min.val=5))
FALSE  TRUE 
 1805 12662

> table(checkSeparation(milo.obj, design.df=milo.design, condition="sex", min.val=1))

FALSE  TRUE 
14463     4 
> table(checkSeparation(milo.obj, design.df=milo.design, condition="sex", min.val=5))

FALSE  TRUE 
14424    43

MarioniLab / miloR

nhood size distribution: some neighbourhoods have over 2000 cells #330