MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

Marker detection in multimodal data #112

Open PeteHaitch opened 11 months ago

PeteHaitch commented 11 months ago

Somewhat thinking out loud here, but I'm interested in your ideas.

For multimodal data (e.g., GEX and ADT), we might be interested in using both modalities (simultaneously) to define markers. I've been doing this by rbind()-ing the logcounts() of each modality (along with some tidying up the rownames by prepending the ADT feature names by ADT), and then running scoreMarkers() on that, but this requires allocating another (potentially large) matrix.

I guess I've got a few questions:

  1. I suppose that rbind() could be a delayed op, but I'm not sure when this would get realised by the scran machinery and so I'm unsure if this is worthwhile?
  2. Am I missing a better/simpler way of achieving this? Something using applySCE(sce, scoreMarkers()) gets very close, but the rank.* statistics are then computed separately for each modality and so won't be the same as if they were computed jointly on all modalities (the other statistics yield identical results whether computed separately or jointly on all modalities). Perhaps running scoreMarkers(full.stats = TRUE) and then re-computing the rank.* statistics with computeMinRank() applied to the full.* columns would work?
  3. What might a scoreMarkers()/findMarkers() interface for multimodal data look like?
  4. Would this easy to achieve with the existing code or require some re-design?
LTLA commented 11 months ago

About to sleep but the rbind approach seems reasonable if you want the ranks to be comparable. But comes with some performance loss because the current scran falls back to block processing (though this would not be a problem if it was refactored to use libscran). Otherwise 2 is also fine but also requires some recompute of the ranks.