TreeSummarizedExperiment support

TuomasBorman commented 1 month ago

Hello!

I want to open discussion about adding a support for TreeSummarizedExperiment (TreeSE) object.

TreeSE is a extension to SingleCellExperiment (SCE) object by adding slots for row and column trees. These trees are especially relevant in microbiome field where species relations are illustrated as phylogeny trees (rowTree slot in TreeSE). You can find more info on microbiome data science and TreeSE class from here: https://microbiome.github.io/OMA/docs/devel/

In microbiome field, large population cohorts are rather common. For instance, Ruuskanen et al., studied large Finnish cohort on how microbiome relates to fatty liver disease. They also studied geographical regions.

There might not be as many applications for images as in spatial transcriptomics, and coordinates (or location groups) can be stored in colData. However, I think supporting also TreeSE might benefit both fields by allowing microbiome researchers to access tools used in spatial transcriptomics and vise versa. This might give us an additional synergy as it further extends the SummarizedExperiment ecosystem, ultimately reducing redundant efforts and enhancing collaboration.

Because TreeSE is SCE, we can already coarse TreeSE to SpatialExperiment, however, we lose TreeSE-specific slots.

library(TreeSummarizedExperiment)
library(ape)
library(SpatialExperiment)

assay_data <- rbind(rep(0, 4), matrix(1:20, nrow = 5))
colnames(assay_data) <- paste0("sample", 1:4)
rownames(assay_data) <- paste("entity", seq_len(6), sep = "")
row_data <- data.frame(Kingdom = "A",
                       Phylum = rep(c("B1", "B2"), c(2, 4)),
                       Class = rep(c("C1", "C2", "C3"), each = 2),
                       OTU = paste0("D", 1:6),
                       row.names = rownames(assay_data),
                       stringsAsFactors = FALSE)
set.seed(12)
row_tree <- rtree(5)
tip_lab <- row_tree$tip.label
row_lab <- tip_lab[c(1, 1:5)]
tse <- TreeSummarizedExperiment(assays = list(Count = assay_data),
                         rowData = row_data,
                         rowTree = row_tree,
                         rowNodeLab = row_lab
                         )
tse
as(tse, "SpatialExperiment")

-Tuomas

HelenaLC commented 1 month ago

Hey, thanks for bringing up TSE. This has actually come up in some discussions, however, (I think) it is not straightforward to implement. Specifically, both TSE and SPE inherit from SCE, so that we cannot inherit slots from one or the other or both, when we like. Instead, SPE would have to inherit from TSE, which inherits from SCE. That said, it's certainly possible, but would add another layer of dependency (& potential instability). Plus extra "cluttering" for those fine without the tree extras... So, I have no strong opinion here, just wanted to clarify the development side of things...

TuomasBorman commented 1 month ago

I see, this seems to be more complicated thing. I don't have direct experience with analyzing spatial microbiome data, so I'm unsure about the necessity of combining SpatialExperiment with TreeSummarizedExperiment. I know this is an area people are working on, and it’s always preferable to enhance existing methods rather than create overlapping ones.

@antagomir might have more insights on this.

antagomir commented 1 month ago

It sounds potentially very interesting area for development but it also seems like a major undertaking if those updates should be implemented across the package ecosystem.

TSE adds row and col trees to SCE (plus a sequence slot which might be less essential here). In principle, one could just add the same (or similar) tree capacity as an extra feature to SPE directly without the need to inherit TSE. This would not be optimal in terms of SPE vs. TSE interoperability but it would allow development and testing of methods that use feature or sample trees in the spatial context.

HelenaLC commented 1 month ago

Just throwing this out there... Have you checked out SpatialFeatureExperiment? It extends the SPE by row/colGraphs. That is, graphs not trees; however, graphs are more appropriate in the context of ST data, I'd say. E.g., one can imagine spatial regions that contain subsets of cells, however, they needn't be hierarchically organized, but can have arbitrary relationships (e.g., nested/fully containing another, intersecting, disconnected etc.).

drighelli commented 1 month ago

Hi Everyone,

we already had on slack a similar conversation in 2020 and we already have a similar solution.

SingleCellExperiment objects have colPairs and rowPairs for graph-like representations, still don't know the difference with row/colGraphs in SpatialFeatureExperiment, but the colPairs/rowPairs are already implemented in SCE objects and so in SPE objects.

I hope this could be helpful.

Ciao, Dario

edit: as you already know, in several programming languages it is possible to inherit from multiple classes. The same can be done in R for S4 classes, but in the end I think everything could become very complicated when the classes inherit from the same original class (the SummarizedExperiment in this case), especially in R which is not a real Object Oriented language.

TuomasBorman commented 1 month ago

Thanks! I still feel these solutions are somewhat suboptimal and don't fully address the need. Ideally, the object should inherit from both TreeSE, since the entire microbiome ecosystem is built around it, and SpatialExperiment, to allow the use of spatial analysis tools. I'm not sure if there's an optimal solution, but if the inheritance issues could be resolved, there might be potential for a "TreeSpatialExperiment." That said, I'm not an expert in this area, so I'm unsure how necessary this feature is or how much effort should be invested in it.

HelenaLC commented 1 month ago

Just tried & this works...

> setClass("TSPE", contains=c(
+     "SpatialExperiment", 
+     "TreeSummarizedExperiment"))
> spe <- SpatialExperiment()
> tspe <- as(spe, "TSPE")
> # SPE & TSE accessors work...
> spatialCoords(tspe) 
<0 x 0 matrix>
> rowTree(tspe)
NULL

...i.e., one could define a class that inherits from both, e.g., defined in an independent package.

We could, in principle, also define such a class in SPE, granted it doesn't add any extra dependencies. A specialized show method is really all that'd take. (the other way around is probably suboptimal, since we got magick on our end... then again, users would also be required to install it whether or not they need it...) - open to discuss.

TuomasBorman commented 1 month ago

Cool! I quickly checked and seems to work with TreeSE demoset.

If that's all it takes, I believe creating a new class would be beneficial. "Real" class would make microbiome and spatial tools closer to each other. It could be easy to just add to your existing package, but not sure if it is good to add TreeSummarizedExperiment as dependancy as it would be rarely used by most of the users (at least currently)

HelenaLC commented 1 month ago

I'll give it a try/some more thought... haven't seen it in action, but wondering if there's a way to cheat our way into defining a class using Suggests: only, e.g., not depending on TSE per se... let's see, not sure that's even possible.

LiNk-NY commented 1 month ago

I think it would be better to start with a real-world analysis use case before embarking on potentially creating a new class / merging classes. We could also consider discussing this in the Classes working group. IIUC, using the contains argument would mean that you'd have to duplicate the assay data in both the SpatialExperiment and the TreeSummarizedExperiment.

HelenaLC commented 1 month ago

Agreed! Would be great to have this discussed as it did come up before. - could you perhaps clarify that last point? I am not (directly) spotting any duplication

> spe <- SpatialExperiment(
+     list(foo=matrix(1,2,3)))
> tspe <- as(spe, "TSPE")
> names(attributes(tspe))
 [1] "int_elementMetadata" "int_colData"        
 [3] "int_metadata"        "rowRanges"          
 [5] "colData"             "assays"             
 [7] "NAMES"               "elementMetadata"    
 [9] "metadata"            "rowTree"            
[11] "colTree"             "rowLinks"           
[13] "colLinks"            "referenceSeq"       
[15] "class"              
> tspe@assays
An object of class "SimpleAssays"
Slot "data":
List of length 1
names(1): foo

LiNk-NY commented 1 month ago

Hi Helena, @HelenaLC

Presumably if you have a composed class, you'd need one of each class to create a complete instance of TSPE. In the example above, the coercion can happen but you won't be able to fully use the interface for the TreeSummarizedExperiment unless the data for that object is populated (I am guessing that one would use the same assay for each class).

Thanks Dario for creating the issue for the working group.

drighelli / SpatialExperiment

TreeSummarizedExperiment support #158