Bioconductor / SummarizedExperiment

SummarizedExperiment container
https://bioconductor.org/packages/SummarizedExperiment
29 stars 9 forks source link

Should the show() method for SummarizedExperiment objects suggest saveHDF5SummarizedExperiment()? #59

Open hpages opened 2 years ago

hpages commented 2 years ago

Vince (@vjcitn) suggested:

One could imagine the show method for SummarizedExperiment checking to see if there is evidence of HDF5 in the assay and if it finds it, it adds a line 'To serialize, use saveHDF5SummarizedExperiment.' It may not be completely foolproof but it might help.

Motivated by https://community-bioc.slack.com/archives/C6KJHH0M9/p1633536980007800

Another approach that was suggested is to add save and saveRDS in BiocGenerics, and making them fail (advising use of special method) if handed an HDF5SummarizedExperiment derivate.

However there are several complications with this:

  1. There's no HDF5SummarizedExperiment class. These objects are just SummarizedExperiment objects or derivatives and dispatch cannot be used to distinguish between those that have on-disk data from those that have in-memory data.
  2. On-disk data could be present in any object, not just SummarizedExperiment objects. For example a GRanges object could have a TileDBMatrix object in its metadata columns.
  3. save() cannot easily be turned into a generic.
  4. There are situations where it's ok to call save() or saveRDS() on an object with on-disk data.

Feedback and suggestions are welcome.

PeteHaitch commented 2 years ago

FWIW in BSseq we have a modified show() that alerts the user that some of the assays are HDF5-backed (it doesn't handle other on-disk backends) but it doesn't provide specific advice about how to save/serialize the object.

suppressPackageStartupMessages(library(bsseq))
suppressWarnings(example("BSseq", "bsseq", echo = FALSE, verbose = FALSE))
#> Loading required package: DelayedArray
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
#> 
#> Attaching package: 'DelayedArray'
#> The following objects are masked from 'package:base':
#> 
#>     aperm, apply, rowsum, scale, sweep
#> Loading required package: rhdf5
#> 
#> Attaching package: 'HDF5Array'
#> The following object is masked from 'package:rhdf5':
#> 
#>     h5ls
hdf5_BS1
#> An object of type 'BSseq' with
#>   3 methylation loci
#>   3 samples
#> has not been smoothed
#> Some assays are HDF5Array-backed

Created on 2021-10-08 by the reprex package (v2.0.1)