MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs
https://bioconductor.org/packages/release/bioc/html/miloR.html
GNU General Public License v3.0
344 stars 22 forks source link

MiloR on compositional data #281

Closed Dario-Rocha closed 1 year ago

Dario-Rocha commented 1 year ago

Thank you for your work with MiloR package, proper analysis tools for analysing shifts in abundance in single cell data are only recently starting to get developed, even though it has been done in other fields (like microbiota) for a while now.

Since cell abundance in single cell is actually compositional data, and euclidean statistical methods are not necessarily appropriate for modelling constrained data, I was wondering how does this package account for the fact that we are dealing with compositional data. As far as I've understood, some methods to deal with this type of data include transforming the observed counts to log-ratios of counts or using Dirichlet-based models, but I haven't found details about this in your publication, so I am curious about how did you address this issue.

Best regards

MikeDMorgan commented 1 year ago

Hi @Dario-Rocha Thanks for your question. Milo uses a negative binomial GLM (hence a log-linear model) to model the variation in kNN-graph neighbourhood abundances across experimental samples. To address potential compositional biases we make use of several possible model offset options: log sum, relative log expression or trimmed mean of M-values; the latter is the default and preferred option. More details are in the manuscript and supplementary materials: https://www.nature.com/articles/s41587-021-01033-z