Number of tested neighbourhoods and other parameters

Thapeachydude commented 9 months ago

Hi,

I was wondering if you could lend some advice on how to best use the tool. I have a scRNA-Seq dataset from several patients and conditions. There is a considerable "patient-effect" leading them to separate when doing PCA --> UMAP, which makes it difficult to assess condition effects. The best this far is a pseudo-bulk and fitting a linear model (~ patientID + Treatment) with pairwise comparison of treatments to control.

Currently I'm trying to run milo DE with:


sce.milo <- sce.milo[, sce.milo$Treament %in% c("Control, "drugA")]

  de_stat <- de_test_neighbourhoods(sce.milo ,
                             sample_id = "patientID",
                             design = ~ 0 + Treatment,
                             contrasts = "TreatmentdrugA",
                             covariates = c("Treatment"),
                             output_type = "SCE", 
                             plot_summary_stat = TRUE,
                             layout = "UMAP", BPPARAM = mcparam, verbose = TRUE)

Questions: 1). If I have several treatments should I subset the milo object for pairwise comparisons? 2). Is is possible to treat sample_id as a block? 3). Can a cell be assigned to multiple neighbourhoods? --> Is there an easy way to get the neighbourhood(s) a cell was assigned to? 4). What does it mean at the end of de_test_neighbourhoods if a neighbourhood was tested TRUE or FALSE.

Happy about any suggestions!

Best, M

amissarova commented 8 months ago

Hi @Thapeachydude , few comments:

1) if you have a substantial patient effect that you do not correct in your embedding, it is really not advised to use milo or miloDE. milo-methods will rely on the joint, corrected embedding so that you aggregate transcriptionally similar cells from different batches (patients in your case) into same neighbourhoods. So first of all, i would ensure that you corrected your batch - there are plenty methods that focus on that: MNN, Liger, Harmony, scVI etc

2) >> . If I have several treatments should I subset the milo object for pairwise comparisons? -- well you can use a complex design matrix (same in edgeR), i cant really give an advice on how to perform the testing. If a question is that whether you need to subset sce for the right cells, then no - you can use all the cells for the neighbourhood construction and then specify the exact comparison (so which conditions) in design matrix. There is good edgeR manual, maybe it will help (check chapter 3 for how to set up your design matrix if you have many conditions): https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

p.s. if you are performing a simple linear testing between two conditions (w no interaction etc), i would personally change design to ~Treatment, coz then the first column would be intercept (but i dotn know which comparison you are making, so maybe ~0+Treatment is what you really want).

3) >> Is is possible to treat sample_id as a block? -- Im not 100% sure what you mean, but if you do have covariates that you want to use as block - you can definitely do that by adding them into design. The only thing that you can not use the same covariate for a block and for sample_id right, coz then you wont be able to perform testing (no replicates).

4) >> . Can a cell be assigned to multiple neighbourhoods? --> Is there an easy way to get the neighbourhood(s) a cell was assigned to? -- yes, you can run something like this: which(nhoods(sce)[cell_id,] == 1)

5) >> . What does it mean at the end of de_test_neighbourhoods if a neighbourhood was tested TRUE or FALSE. -- there are some combinations of genes and Nhoods for which testing for DE wasnt performed - main reasons that there are not enough cells from both conditions or gene is not expressed. So test_performed is the boolean indicating whether this combination of gene-Nhood was tested or no

Thapeachydude commented 8 months ago

Hi, thanks for the detailed reply (and sorry for the late response). I think the key issue is the considerable patient effect then. I'm working with a cancer data set of several patients, all was processed in a single batch. The healthy cells all group together, but the tumor cells separate very strongly by patient (known).

Pseudo-bulk + edgeR (~ patientID + Treatment) worked relatively well, but I'm struggling to find a more single-cell alternative (more fine-grained resolution). Correcting patient effects with harmony, fastWNN etc. is I think not warranted, given that 1). this isn't really a batch effect, 2). even if you do run it, all cells are quite brutally mushed together and even key biological features such as different diagnoses form one big blob.

I guess miloDE might not be applicable then. Nonetheless, thanks for the reply and the insights!

amissarova commented 7 months ago

I think its worth considering that a patient effect is a batch effect here, then integrating all data using this batch, and looking into results. But yes, you probably will need to be careful with the results you are interpreting then, and you might very well have overlap of batch and biological effect.

MarioniLab / miloDE

Number of tested neighbourhoods and other parameters #36