brianhie / scanorama

Panoramic stitching of single cell data
http://scanorama.csail.mit.edu
MIT License
260 stars 49 forks source link

Integration of CITEseq data: Use Scanorama with ADT surface marker? #80

Closed jkniffka closed 3 years ago

jkniffka commented 3 years ago

First of all, thank you for Scanorama! Pretty impressive package.

We work with CITEseq and examine patient samples longitudinally over multiple time points. For each time point we generate an RNA count matrix as well as an ADT count matrix (antibody dervied tags/= detection of 80 different surface markers). The ADT count matrix has the same structure as the RNA matrix (feature x cellbarcode).

So far, we have integrated the RNA matrices of the different time points with Scanorama. (This also worked very well!). Now we would like to integrate the ADT matrices as well. Here we have two questions:

1. Can we do Scanorama with the count matrices of the ADTs? 2. if it is possible: should we do Scanorama for each modality separately or could we put RNA + ADT countmatrix together for each time point and integrate them?

Thanks in advance!

brianhie commented 3 years ago

Glad to hear you found Scanorama useful!

Since the ADT count matrix is defined over the same set of cells, you can definitely just use Scanorama on the combined RNA + ADT matrix, which may provide additional information to the algorithm as to whether or not to align certain cells, or keep them separate.

Integrating each modality separately is also a possibility, though there the algorithm would be losing information during the integration process, particularly on just the ADT count matrix. For example, if the ADT information says two sets of cells are actually quite different, but their RNA values are the same, then integrating just RNA would lose that biological difference (and vice versa).

jkniffka commented 3 years ago

Thanks for the quick reply.

We have prepared the ADT and RNA matrix with different normalization techniques. ADT with DSB (removal of unbound antibodies still present in the solution despite washing step after staining) / RNA with scran normalization by deconvolution to account for different sequencing depths. (Of course, we can log transform both matrices identically.)

Provided we merge the ADT and RNA matrices and use the merged tables of each time point as input to Scanorama, should the normalization techniques be identical? (Of course, we can log transform both matrices identically.)

Alternatively, our idea would be to integrate ADT and RNA separately and integrate the embeddings with weighted neares neigbors analysis (Hao, Hao et al, bioRxiv 2020). What would be your recommendation?

brianhie commented 3 years ago

In this case, I'd normalize each modality separately with the appropriate technique before inputting into Scanorama. This strategy will weigh each feature the same in the integration. You could also definitely try "weighing" certain modalities higher than others (e.g., the ADT matrix higher than RNA), especially if you believe one modality is more informative than another.

friedue commented 3 years ago

In my experience, ADT and scRNA-seq live on very different scales; I'd caution against treating them as equivalent when integrating. Aaron Lun just recently implemented the MNN-based integration using both information types (https://github.com/LTLA/mumosa/blob/master/R/multiModalMNN.R) in the mumosa package in the case that R would be an option for you.

jkniffka commented 3 years ago

@friedue Thanks for the package recommendation and the comment. I'll take a look.

We just discussed @brianhie answers and would integrate each modality over the different time points individually with Scanorama. In a second step we would integrate both modalities weighted. We would have the new Seurat version/WNN mechanism in mind for this. Seurat v.4/WNN

The point about the different scales is a good one. In addition, the technical aspects such as dropout (loss of integrity of the cell during wet lab processing) also need to be considered. This variance is much higher with RNA. The measured ADT data are not as "error prone". Furthermore, our 80 surface marker would be relatively underestimated when merging at the same feature weight with the RNA matrix, as it contains many more genes.

However, we do not have that much experience yet. We are open for comments.

friedue commented 3 years ago

I think I'm missing a detail here -- why do you plan on using both scanorama and WNN?

brianhie commented 3 years ago

Closing since this does not seem to be a bug/issue with Scanorama, but feel free to reopen or comment if you have more questions.