Closed Dario-Rocha closed 1 year ago
Hi @Dario-Rocha Thanks for your question. Milo uses a negative binomial GLM (hence a log-linear model) to model the variation in kNN-graph neighbourhood abundances across experimental samples. To address potential compositional biases we make use of several possible model offset options: log sum, relative log expression or trimmed mean of M-values; the latter is the default and preferred option. More details are in the manuscript and supplementary materials: https://www.nature.com/articles/s41587-021-01033-z
Thank you for your work with MiloR package, proper analysis tools for analysing shifts in abundance in single cell data are only recently starting to get developed, even though it has been done in other fields (like microbiota) for a while now.
Since cell abundance in single cell is actually compositional data, and euclidean statistical methods are not necessarily appropriate for modelling constrained data, I was wondering how does this package account for the fact that we are dealing with compositional data. As far as I've understood, some methods to deal with this type of data include transforming the observed counts to log-ratios of counts or using Dirichlet-based models, but I haven't found details about this in your publication, so I am curious about how did you address this issue.
Best regards