Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

ExperimentList #2563

Closed bhuvad closed 2 years ago

bhuvad commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @bhuvad

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ExperimentList
Title: S4 Class for handling lists of Experiments
Version: 0.99.0
Authors@R: 
    person(given = "Dharmesh D.",
 family = "Bhuva",
 role = c("aut", "cre"),
 email = "bhuva.d@wehi.edu.au",
 comment = c(ORCID = "0000-0002-6398-9157"))
Description: The ExperimentList package defines S4 classes to handle data from 
  multiple experiments or studies by providing features of lists as well as those
  of a concatenated experiment. Individual experiments can be in the form of 
  SummarizedExperiment, Ranged SummarizedExperiment, SingleCellExperiment, or 
  SpatialExperiment objects. Annotations specific to each experiment are stored
  thus providing a unified interface to dealing with data from multiple studies.
  Specialised functions to access experiment data, and to apply functions across
  experiments are implemented. Existing functions implemented for each individual
  experiment (e.g., SingleCellExperiment::reducedDim()) can be readily applied
  across the entire list of experiments.
biocViews: DataRepresentation, GeneExpression, Infrastructure, SingleCell, Spatial
License: GPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Depends: 
    R (>= 4.1),
    S4Vectors,
    SingleCellExperiment,
    SpatialExperiment,
    SummarizedExperiment
Imports: 
    BiocGenerics,
    methods
Suggests: 
    rmarkdown,
    knitr,
    testthat (>= 3.0.0),
    BiocStyle,
    prettydoc,
    GenomicRanges,
    ExperimentHub,
    TENxVisiumData
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://davislaboratory.github.io/ExperimentList
BugReports: https://github.com/DavisLaboratory/ExperimentList/issues
lgatto commented 2 years ago

I discovered this package and was wondering what was its relation, if any, with the mature MultiAssayExperiment package that serves a very similar purpose and already defines the class ExperimentList. This is very confusing.

bhuvad commented 2 years ago

Hi @lgatto,

This package is quite different from the ExperimentList provided by MultiAssayExperiment. MultiAssayExperiment provides an interface to host multiple experiments as a list whereas the ExperimentList package I implemented concatenates these objects using cbind() and then provide features of both lists and a single Experiment object. This hybrid interface is particularly beneficial when dealing with data from different batches or different biological samples when dealing with single-cell or spatial transcriptomics.

Unlike the MultiAssayExperiment::ExperimentList(), the ExperimentList::ExperimentList() object puts more restrictions onto objects (rows should be matched) and allows for experiment-specific annotations (e.g., study-specific annotations). Since it provides a hybrid interface to the object, functions already implemented in SingleCellExperiment/SpatialExperiment can be applied collectively across different experiments, as well as independently using the elapply() function I have implemented. Further details on the specifics of this package, including what I have discussed above, can be found in the vignettes and the package description. The schematic I have provided should help differentiate this object from the one provided in the MultiAssayExperiment package.

Additionally, the MultiAssayExperiment object is very different from this package. It stores data from multiple assays belonging to the same set of samples unlike this package that stores information about different samples, but of the same type of assay.

Cheers, Dharmesh

PeteHaitch commented 2 years ago

Hi @bhuvad,

I was also curious about ExperimentList, but after reading the documentation I think you may need to revisit the motivation for this package. I didn't see a good example of things that could be done (or done more easily) with ExperimentList that couldn't already be done with SummarizedExperiment/SingleCellExperiment/SpatialExperiment. I'm not sure if these issues reflect a misunderstanding of the SummarizedExperiment-related packages and classes, but I think you want to revise the motivation to better illustrate where ExperimentList may be useful.

Firstly, the text in the vignette that motivates the ExperimentList package mischaracterises the SummarizedExperiment class and its derivatives. For example (emphasis mine),

The SingleCellExperiment and SpatialExperiment objects are able to store even higher resolution single-cell and spatial transcriptomics measurements from a single biological sample respectively. Such data is not easily stored and manupilated within a single object. Concatenation of objects can partially resolve this data since it can be analysed in unison, however, prevents for sample-wise analysis.

Users regularly store and manipulate data from multiple samples in a SingleCellExperiment or SpatialExperiment object. It's not clear to me what prevents sample-wise analyses (I usually do this by subsetting the object to the relevant sample).

Secondly, the example in the vignette doesn't illustrate things that couldn't be achieved with the existing functionality of the SummarizedExperiment/SingleCellExperiment/SpatialExperiment packages. I think this is a missed opportunity to showcase ExperimentList. In particular, it seems that the example in the vignette could be handled using existing functions and packages (my quick testing to illustrate this is shown below)

suppressPackageStartupMessages(library(SingleCellExperiment))
suppressPackageStartupMessages(library(TENxVisiumData))
#> snapshotDate(): 2022-03-01

#download data
spe1 <- TENxVisiumData::HumanBreastCancerIDC()
#> see ?TENxVisiumData and browseVignettes('TENxVisiumData') for documentation
#> loading from cache
spe2 <- TENxVisiumData::HumanBreastCancerILC()
#> see ?TENxVisiumData and browseVignettes('TENxVisiumData') for documentation
#> loading from cache

#remove alt exps - these should be matched across exps (likewise for rownames)
altExps(spe1) <- NULL
altExps(spe2) <- NULL

#create some artificial experiment annotations
spe1$sex <- "Female"
spe1$age <- 65
spe2$sex <- "Female"
spe2$age <- 68

#create ExperimentList objects
# PH: Create a list of SPE objects and single SPE object.
lspe <- List("1" = spe1, "2" = spe2)
spe <- do.call(cbind, lspe)
# PH: Adding an 'experiment' variable. 
spe$experiment <- factor(rep(names(lspe), times = sapply(lspe, ncol)))

#subset the first five features and first three samples
spe[1:5, spe$sample_id %in% unique(spe$sample_id)[1:3]]
#> class: SpatialExperiment 
#> dim: 5 12110 
#> metadata(0):
#> assays(1): counts
#> rownames(5): ENSG00000243485 ENSG00000237613 ENSG00000186092
#>   ENSG00000238009 ENSG00000239945
#> rowData names(1): symbol
#> colnames(12110): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#>   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(4): sample_id sex age experiment
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

#subset the first five features and all columns from the second experiment
spe[1:5, spe$experiment == 2]
#> class: SpatialExperiment 
#> dim: 5 4325 
#> metadata(0):
#> assays(1): counts
#> rownames(5): ENSG00000243485 ENSG00000237613 ENSG00000186092
#>   ENSG00000238009 ENSG00000239945
#> rowData names(1): symbol
#> colnames(4325): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(4): sample_id sex age experiment
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

#number of experiments
length(lspe)
#> [1] 2
nlevels(spe$experiment)
#> [1] 2
#names of experiments
names(lspe)
#> [1] "1" "2"
levels(spe$experiment)
#> [1] "1" "2"

#get a list of individual experiments
# PH: Could just reuse `lspe` but here's how you might do it if you only had a 
#     single SPE.
lapply(levels(spe$experiment), function(e) spe[, spe$experiment == e])
#> [[1]]
#> class: SpatialExperiment 
#> dim: 36601 7785 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(1): symbol
#> colnames(7785): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#>   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(4): sample_id sex age experiment
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> 
#> [[2]]
#> class: SpatialExperiment 
#> dim: 36601 4325 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(1): symbol
#> colnames(4325): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(4): sample_id sex age experiment
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

#get experiment annotations
# PH: Manipulate this as necessary to get experiment- or sample-specific 
#     annotations.
colData(spe)
#> DataFrame with 12110 rows and 4 columns
#>                                 sample_id         sex       age experiment
#>                               <character> <character> <numeric>   <factor>
#> AAACAAGTATCTCCCA-1  HumanBreastCancerIDC1      Female        65          1
#> AAACACCAATAACTGC-1  HumanBreastCancerIDC1      Female        65          1
#> AAACAGAGCGACTCCT-1  HumanBreastCancerIDC1      Female        65          1
#> AAACAGGGTCTATATT-1  HumanBreastCancerIDC1      Female        65          1
#> AAACAGTGTTCCTGGG-1  HumanBreastCancerIDC1      Female        65          1
#> ...                                   ...         ...       ...        ...
#> TTGTTGTGTGTCAAGA-1 HumanBreastCancerILC..      Female        68          2
#> TTGTTTCACATCCAGG-1 HumanBreastCancerILC..      Female        68          2
#> TTGTTTCATTAGTCTA-1 HumanBreastCancerILC..      Female        68          2
#> TTGTTTCCATACAACT-1 HumanBreastCancerILC..      Female        68          2
#> TTGTTTGTGTAAATTC-1 HumanBreastCancerILC..      Female        68          2

#apply function
# PH: Just use lapply(); no need for a new function.
lapply(lspe, dim)
#> $`1`
#> [1] 36601  7785
#> 
#> $`2`
#> [1] 36601  4325

#get the first 100 spots
lapply(lspe, function(spe) spe[, 1:100])
#> $`1`
#> class: SpatialExperiment 
#> dim: 36601 100 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(1): symbol
#> colnames(100): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#>   AACGCGACCTTGGGCG-1 AACGCGGTCTCCAGCC-1
#> colData names(3): sample_id sex age
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#> 
#> $`2`
#> class: SpatialExperiment 
#> dim: 36601 100 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(1): symbol
#> colnames(100): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#>   AACCCGAGCAGAATCG-1 AACCCTACTGTCAATA-1
#> colData names(3): sample_id sex age
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
# PH: and also combine into a single SPE
do.call(cbind, lapply(lspe, function(spe) spe[, 1:100]))
#> class: SpatialExperiment 
#> dim: 36601 200 
#> metadata(0):
#> assays(1): counts
#> rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
#>   ENSG00000277196
#> rowData names(1): symbol
#> colnames(200): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#>   AACCCGAGCAGAATCG-1 AACCCTACTGTCAATA-1
#> colData names(3): sample_id sex age
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor

#extract image data for each object separately
lapply(lspe, imgData)
#> $`1`
#> DataFrame with 2 rows and 4 columns
#>               sample_id    image_id   data scaleFactor
#>             <character> <character> <list>   <numeric>
#> 1 HumanBreastCancerIDC1      lowres   ####   0.0247525
#> 2 HumanBreastCancerIDC2      lowres   ####   0.0247525
#> 
#> $`2`
#> DataFrame with 1 row and 4 columns
#>                sample_id    image_id   data scaleFactor
#>              <character> <character> <list>   <numeric>
#> 1 HumanBreastCancerILC..      lowres   ####   0.0247525
#extract image data collectively
imgData(spe)
#> DataFrame with 3 rows and 4 columns
#>                sample_id    image_id   data scaleFactor
#>              <character> <character> <list>   <numeric>
#> 1  HumanBreastCancerIDC1      lowres   ####   0.0247525
#> 2  HumanBreastCancerIDC2      lowres   ####   0.0247525
#> 3 HumanBreastCancerILC..      lowres   ####   0.0247525

Created on 2022-03-16 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R Under development (unstable) (2021-10-27 r81107) #> os macOS Big Sur/Monterey 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2022-03-16 #> pandoc 2.16.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> AnnotationDbi 1.57.1 2021-11-05 [1] Bioconductor #> AnnotationHub * 3.3.9 2022-02-28 [1] Bioconductor #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0) #> beachmat 2.11.0 2021-10-26 [1] Bioconductor #> Biobase * 2.55.0 2021-10-26 [1] Bioconductor #> BiocFileCache * 2.3.4 2022-01-20 [1] Bioconductor #> BiocGenerics * 0.41.2 2021-11-19 [1] Bioconductor #> BiocManager 1.30.16 2021-06-15 [1] CRAN (R 4.2.0) #> BiocParallel 1.29.17 2022-03-13 [1] Bioconductor #> BiocVersion 3.15.0 2021-10-26 [1] Bioconductor #> Biostrings 2.63.1 2022-01-05 [1] Bioconductor #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0) #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0) #> blob 1.2.2 2021-07-23 [1] CRAN (R 4.2.0) #> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0) #> cli 3.2.0 2022-02-14 [1] CRAN (R 4.2.0) #> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.2.0) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0) #> dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.2.0) #> DelayedArray 0.21.2 2021-11-19 [1] Bioconductor #> DelayedMatrixStats 1.17.0 2021-10-26 [1] Bioconductor #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.2.0) #> dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.2.0) #> DropletUtils 1.15.2 2021-11-19 [1] Bioconductor #> edgeR 3.37.0 2021-10-26 [1] Bioconductor #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0) #> ExperimentHub * 2.3.5 2022-01-20 [1] Bioconductor #> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> filelock 1.0.2 2018-10-05 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0) #> GenomeInfoDb * 1.31.5 2022-03-14 [1] Bioconductor #> GenomeInfoDbData 1.2.7 2021-11-01 [1] Bioconductor #> GenomicRanges * 1.47.6 2022-01-12 [1] Bioconductor #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> HDF5Array 1.23.2 2021-11-19 [1] Bioconductor #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0) #> httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.2.0) #> interactiveDisplayBase 1.33.0 2021-11-05 [1] Bioconductor #> IRanges * 2.29.1 2021-11-19 [1] Bioconductor #> KEGGREST 1.35.0 2021-11-05 [1] Bioconductor #> knitr 1.37 2021-12-16 [1] CRAN (R 4.2.0) #> later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0) #> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.2.0) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0) #> limma 3.51.5 2022-02-17 [1] Bioconductor #> locfit 1.5-9.5 2022-03-03 [1] CRAN (R 4.2.0) #> magick 2.7.3 2021-08-18 [1] CRAN (R 4.2.0) #> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.2.0) #> Matrix 1.4-0 2021-12-08 [1] CRAN (R 4.2.0) #> MatrixGenerics * 1.7.0 2021-10-26 [1] Bioconductor #> matrixStats * 0.61.0 2021-09-17 [1] CRAN (R 4.2.0) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0) #> mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> png 0.1-7 2013-12-03 [1] CRAN (R 4.2.0) #> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.2.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.2.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.2.0) #> Rcpp 1.0.8.2 2022-03-11 [1] CRAN (R 4.2.0) #> RCurl 1.98-1.6 2022-02-08 [1] CRAN (R 4.2.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) #> rhdf5 2.39.6 2022-03-09 [1] Bioconductor #> rhdf5filters 1.7.0 2021-11-05 [1] Bioconductor #> Rhdf5lib 1.17.3 2022-01-31 [1] Bioconductor #> rjson 0.2.21 2022-01-09 [1] CRAN (R 4.2.0) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0) #> rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.2.0) #> RSQLite 2.2.10 2022-02-17 [1] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) #> S4Vectors * 0.33.11 2022-03-14 [1] Bioconductor #> scuttle 1.5.0 2021-10-27 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.0) #> SingleCellExperiment * 1.17.2 2021-11-19 [1] Bioconductor #> sparseMatrixStats 1.7.0 2021-10-26 [1] Bioconductor #> SpatialExperiment * 1.5.4 2022-03-11 [1] Bioconductor #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.0) #> SummarizedExperiment * 1.25.3 2021-12-08 [1] Bioconductor #> TENxVisiumData * 1.3.0 2021-11-21 [1] Bioconductor #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.2.0) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.30 2022-03-02 [1] CRAN (R 4.2.0) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0) #> XVector 0.35.0 2021-10-26 [1] Bioconductor #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> zlibbioc 1.41.0 2021-10-26 [1] Bioconductor #> #> [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
bhuvad commented 2 years ago

Hi Pete,

The examples you show above are the exact reasons I figured an object along with basic accessors would simplify the analysis process. Continuing the example of the 10x Visium datasets in the vignette, storing patient-specific annotations is possible by repeating the annotation (e.g., age) for each spot in the SpatialExperiment, however, this ends up creating redundant copies of data in the object. If we were looking at 100s of annotation columns, across 10s of patients, we would end up using a lot more memory to store redundant annotations (1000s of redundant data points). This is the reason I store these separately (in the experimentData slot) along with a map to each spot (in the experimentIndex slot).

The other situation where this is useful is when we try to subset the data manually to perform analysis on each patient. The approach you show is how I have been doing it in the past and I realised then that there is quite a bit of overhead in code when repeatedly doing so. We need to separate the patients every time we want to run an operation on them and then need to combine the results again. The elapply() function I wrote splits the data, runs an operation and combines the results back (where possible) while maintaining the experiment annotations as they are. The example you showed above is replicated much easily with ExperimentList.

These two points justified the creation of a new data structure to host data from multiple experiments. This data structure builds on existing and mature data structures and is therefore able to pass on all their functionalities to users.

I do appreciate your feedback on the need to improve the vignette, specifically the motivation, and I will definitely do that! I will include bits of this response in the motivation section to improve it. I also greatly appreciate you taking your time to have a quick review of the package and for such detailed feedback. Thanks for that @PeteHaitch!

Cheers, Dharmesh

LiNk-NY commented 2 years ago

Hi Dharmesh, @bhuvad

I'm the developer of MultiAssayExperiment / ExperimentList and I agree with Laurent and Pete.

In addition to the points mentioned by Pete and Laurent, the implementation of the class includes a subset of other classes which makes your package more difficult to maintain in the long run when new classes come about. We wrote ExperimentList with less restrictive requirements so that it would be more adaptable to new classes.

My second point is that the package will cause increased confusion not only for current ExperimentList users but also for users of the other classes that you extend to create SummarizedExperimentList, RangedSummarizedExperimentList, SingleCellExperimentList, and SpatialExperimentList. Using the same name as a class already established in Bioconductor is not standard practice and it will cause confusion among users. When we were developing ExperimentList, our prototype was named EList and we shortly found out that it is best to avoid namespace collisions and thus we renamed our class to ExperimentList. If anything, our existing ExperimentList functionality could be moved out of MultiAssayExperiment and made into it's own package (similar to the GenomicRanges to IRanges relationship) where it would be easier to extend for those that are interested in more specific functionality.

Part of the package development process is to avoid reinventing the wheel and extending other classes where a specific functionality is sought, e.g., SpatialExperiment inheriting from SingleCellExperiment. When developing MultiAssayExpeirment, we took input from the Bioconductor community with respect to the types of classes and functionality that would make analyses easier and more integrative.

Best regards, Marcel

bhuvad commented 2 years ago

Hi @LiNk-NY, @PeteHaitch and @lgatto,

I really appreciate you looking into my package and for taking the time to provide such detailed feedback! Based on your feedback, I realise that there may be better ways to tackle the problem I was trying to address. I realise the mistake I made regarding duplicate naming, and I do agree that that is not ideal (I have seen the issues the SpatialImage class from SpatialExperiment and Seurat create and I definitely do not want to be a source of that!). Considering your extensive feedback, I think the best move will be to withdraw this package from the submission system and reconsider whether there is a broader spectrum of problems that can be addressed using list-like objects.

One of the main functions I implemented in ExperimentList was the elapply() function that splits, modifies, and combines SummarizedExperiment/SingleCellExperiment/SpatialExperiment objects. Do you think it is worth transferring this implementation to another package to allow for this functionality? If so, what would you recommend as the best target package?

Once again, thanks for such detailed feedback and the active discussion regarding this package. This was my first time working on a data structure package and your pointers have been very useful in determining what not to do in the future 😊. I will be sure to discuss designs in the bioc-devel mailing list in the future to ensure I gather feedback/interest from the community before developing.

Cheers, Dharmesh

PeteHaitch commented 2 years ago

Hi @bhuvad,

Thank you for your consideration and appreciation of the feedback.

One of the main functions I implemented in ExperimentList was the elapply() function that splits, modifies, and combines SummarizedExperiment/SingleCellExperiment/SpatialExperiment objects. Do you think it is worth transferring this implementation to another package to allow for this functionality? If so, what would you recommend as the best target package?

I had a vague idea that @ltla had implemented something like a split+apply+combine function for SummarizedExperiment objects, but I can't seem to find it now so perhaps I imagined it or it was abandoned. Hopefully Aaron can chime in.

The closest I can find off the top of my head is SingleCellExperiment::applySCE(), but that's focused on applying a function to the main and alternative experiments of a single SCE; although according to the docs: "The behaviour of the the function is equivalent to creating a list containing X as the first entry and altExps(X) in the subsequent entries, and then lapplying over this list with FUN and the specified arguments.", so there's some similarity.

FYI I also work at WEHI in case you'd ever like to brainstorm ideas :)

Cheers, Pete