CaibinSh / scAR-reproducibility

Scripts to reproduce the results of scAR manuscript 'Probabilistic modeling of ambient noise in single-cell omics data'
https://doi.org/10.1101/2022.01.14.476312
1 stars 1 forks source link

Collapse sgRNAs? #3

Closed cnk113 closed 2 years ago

cnk113 commented 2 years ago

Hello,

I'm trying out the CRISPR tutorial, I was wondering if you'd recommend collapsing all the counts for the sgRNAs before running scAR?

Best, Chang

CaibinSh commented 2 years ago

Hi @cnk113 ,

Do you mean collapsing the read counts? Indeed, we input UMI counts to scAR in all experiments.

Best, Caibin

cnk113 commented 2 years ago

I meant the sgRNA count matrix. Since each sgRNA has multiple guides, I was thinking of collapsing the input sgRNA matrix by their respective genes. So GeneA-1, GeneA-2, GeneA-3 would be collapsed to GeneA in the matrix.

CaibinSh commented 2 years ago

I see. It would be better to do it at the guide level, and then collapse all guides targeting the same gene after guide assignment.

The main reason is the assignment accuracy. Let's assume GenaA-1 has more ambient noise than GeneA-2 and GeneA-3, if we collapse them, scAR will not be able to detect ambient or native signal for GeneA-2 or GeneA-3.

The second reason relates to guide effectiveness. Some guides may not be efficient and you may want to exclude them in the final analysis. Collapsing guides will make it hard to do so.

cnk113 commented 2 years ago

I see that makes sense for the ambient removal, for the inference (assignment) portion it looks like its be done by specific guide. Would your recommendation hold there as well, I'd assume so since the training was done at the guide level?

Best, Chang

CaibinSh commented 2 years ago

Yes, it does. We recommend doing both at the guide level. Best, Caibin

cnk113 commented 2 years ago

Final question, since I have 1700 sgRNAs should I also increase the latent dim to compensate?

Also would your model hold if we decide to overload the sgRNAs such that we would expect (more than usual) multiple sgRNAs per cell? This is a seperate question/assumption to my current data.

CaibinSh commented 2 years ago

Final question, since I have 1700 sgRNAs should I also increase the latent dim to compensate?

You can play around with the parameters of NN_layer1, NN_layer2, and latent_space. You can find the help docs in (e.g.,) scAR.model. We would recommend using the default setting to give it the first try. If the result is unexpected, doing a grid search of parameters might help.

Also would your model hold if we decide to overload the sgRNAs such that we would expect (more than usual) multiple sgRNAs per cell? This is a seperate question/assumption to my current data.

Yes, it holds. During inference, you can test a series of global cutoffs (see the help docs in inference function for setting cutoff parameter) to filter out the noise sgRNAs, the resulting sgRNAs in feature_assignment["sgRNAs"] should be the positive signal.

We will integrate a more automatic way for high MOI infections (multiple sgRNAs per cell) in the coming weeks. We will take MOI parameter into consideration, use a poisson model to estimate the distribution of infections (zero, single, two or etc..), and find the best global cutoff based on it. But currently, you might need to manually test the cutoff parameter.

Please do not hesitate to contact me if there are any further questions.

Best, Caibin

cnk113 commented 2 years ago

Thank you for the quick and detailed responses!