Recommendation for using Milo for pooled CRISPR screens

MarioniLab / miloR

R package implementation of Milo for testing for differential abundance in KNN graphs

https://bioconductor.org/packages/release/bioc/html/miloR.html

GNU General Public License v3.0

345 stars 22 forks source link

Recommendation for using Milo for pooled CRISPR screens #259

Closed joschif closed 1 year ago

joschif commented 1 year ago

Heyhey!

I've been playing around with Milo for the analysis of a pooled single-cell CRISPR screen in the format of CROP/pertrub-seq. From what I understand through the tutorials, Milo requires that the experimental condition is always the same within each sample. However in pooled CRISPR screens, each sample contains a mix of conditions and optimally I would want to control for the sample of origin as a covariate. Do you have any recommendations how to deal something like this in Milo?

Cheers, Jonas

emdann commented 1 year ago

Hi @joschif, you can account for the sample of origin by including the sample of origin in the design formula.

e.g. adapting the example from the README

milo.design$SampleGroup <-stringr::str_remove(as.character(milo.design$Sample), 'A_|B_')
milo.res <- testNhoods(milo.obj, design=~SampleGroup+Condition, design.df=milo.design)

There are several detailed examples from R and Bioc blogs on how to tweak design formulas to encode a "paired" design in GLMs and for differential expression analysis. I can recommend this one and this one.

joschif commented 1 year ago

Hi @emdann, thanks a lot for the quick reply! This is roughly what I've been doing so far. Mostly my question was actually concerning the step of counting cell neighborhoods as explained in your tutorial here. If I use the actual sample indicator as sample here then I later get the error

Design matrix (23) and nhood counts (12) are not the same dimension

I assume this it because samples can have multiple conditions, resulting in a design matrix that has more rows than there are samples. My current workaround is to concatenate condition and sample into a "pseudo sample" column and use this as the sample in countCells. But I was wondering if there is any more proper way to deal with this that you would recommend. Cheers, Jonas

MikeDMorgan commented 1 year ago

Hi @joschif @emdann - perhaps this is something that we can write a new vignette around as it violates the 1 observation = 1 experimental sample mapping that we generally assume.

There are 2 approaches that could be taken here:

setup a separate variable for each sgRNA relative to the control so that the samples are consistent in your nhood count and design matrices, e.g., 12 samples - but where they might be included in multiple "condition" variables.
create pseudo-samples as you suggest such that each experimental sample is split into the cells with each of the target and control sgRNAs.