Bioconductor / LoomExperiment

A package to read, write, and manipulate loom files using LoomExperiments. Uses the loom file format from the Linnarson Lab. https://linnarssonlab.org/loompy/
https://www.bioconductor.org/packages/LoomExperiment
6 stars 5 forks source link

rowData doesn't survive round trip #3

Closed lazappi closed 5 years ago

lazappi commented 5 years ago

Hi

Thanks for developing this package! I've just discovered it and it seems like a great way to allow interoperability with some of the Python scRNA-seq packages.

I have been playing around to see if I export some data and read it back in and I can't get the rowData (and rownames) to survive the trip. Is this expected behaviour, a bug or am I just doing something wrong? Here is an example of what I have been trying:

library(LoomExperiment)

counts <- matrix(rpois(100, lambda = 10), ncol = 10, nrow = 10)
sce <- SingleCellExperiment(assays = list(counts = counts))
scle <- SingleCellLoomExperiment(sce)

rowData(scle)$row_info <- runif(10)
colData(scle)$col_info <- runif(10)

colnames(scle) <- paste0("C", 1:10)
rownames(scel) <- paste0("R", 1:10)

tempfile <- tempfile(fileext = ".loom")
export(scle, tempfile)
scle2 <- import(tempfile)

scle
# class: SingleCellLoomExperiment
# dim: 10 10
# metadata(0):
# assays(1): counts
# rownames(10): R1 R2 ... R9 R10
# rowData names(1): row_info
# colnames(10): C1 C2 ... C9 C10
# colData names(1): col_info
# reducedDimNames(0):
# spikeNames(0):
# rowGraphs(0): NULL
# colGraphs(0): NULL

scle2
# class: SingleCellLoomExperiment
# dim: 10 10
# metadata(2): CreatedWith LoomExperiment-class
# assays(1): matrix
# rownames: NULL
# rowData names(0):
# colnames(10): C1 C2 ... C9 C10
# colData names(1): col_info
# reducedDimNames(0):
# spikeNames(0):
# rowGraphs(0): NULL
# colGraphs(0): NULL

Any thoughts?

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

Matrix products: default
BLAS: /usr/local/installed/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /usr/local/installed/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] LoomExperiment_1.0.2        rtracklayer_1.42.1
 [3] rhdf5_2.26.2                SingleCellExperiment_1.4.1
 [5] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
 [7] BiocParallel_1.16.5         matrixStats_0.54.0
 [9] Biobase_2.42.0              GenomicRanges_1.34.0
[11] GenomeInfoDb_1.18.1         IRanges_2.16.0
[13] S4Vectors_0.20.1            BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0               compiler_3.5.0           BiocManager_1.30.4
 [4] git2r_0.24.0             workflowr_1.1.1          XVector_0.22.0
 [7] R.methodsS3_1.7.1        R.utils_2.7.0            bitops_1.0-6
[10] tools_3.5.0              zlibbioc_1.28.0          digest_0.6.18
[13] packrat_0.5.0-5          evaluate_0.12            lattice_0.20-35
[16] Matrix_1.2-14            yaml_2.2.0               xfun_0.4
[19] GenomeInfoDbData_1.2.0   stringr_1.3.1            knitr_1.21
[22] Biostrings_2.50.2        rprojroot_1.3-2          grid_3.5.0
[25] glue_1.3.0               HDF5Array_1.10.1         XML_3.98-1.16
[28] rmarkdown_1.11           Rhdf5lib_1.4.2           magrittr_1.5
[31] whisker_0.3-2            GenomicAlignments_1.18.1 Rsamtools_1.34.1
[34] backports_1.1.3          htmltools_0.3.6          stringi_1.2.4
[37] RCurl_1.95-4.11          R.oo_1.22.0
nh3 commented 5 years ago

Hi,

I am having exactly the same problem. The cause appears to be that export() write row data as GRangesList, which leads to relocation of hdf5 attributes as well as loss of information:

> l1_file <- system.file("extdata", "L1_DRG_20_example.loom", package = "LoomExperiment")
> scle <- import(l1_file, type="SingleCellLoomExperiment")
> export(scle, 'test1.loom')
$ h5ls -r R/library/LoomExperiment/extdata/L1_DRG_20_example.loom | tail
/matrix                  Dataset {20, 20}
/row_attrs               Group
/row_attrs/Accession     Dataset {20}
/row_attrs/Gene          Dataset {20}
/row_attrs/X_LogCV       Dataset {20}
/row_attrs/X_LogMean     Dataset {20}
/row_attrs/X_Selected    Dataset {20}
/row_attrs/X_Total       Dataset {20}
/row_attrs/X_Valid       Dataset {20}
/row_attrs/rownames      Dataset {20}

$ h5ls -r test1.loom | tail -n 12
/matrix                  Dataset {20, 20}
/row_attrs               Group
/row_attrs/GRangesList   Group
/row_attrs/GRangesList/end Dataset {20, 1}
/row_attrs/GRangesList/group_name Dataset {20, 1}
/row_attrs/GRangesList/lengths Dataset {20, 1}
/row_attrs/GRangesList/names Dataset {20, 1}
/row_attrs/GRangesList/rownames Dataset {20, 1}
/row_attrs/GRangesList/seqnames Dataset {20, 1}
/row_attrs/GRangesList/start Dataset {20, 1}
/row_attrs/GRangesList/strand Dataset {20, 1}
/row_attrs/GRangesList/width Dataset {20, 1}
nh3 commented 5 years ago

I think I might have found the cause here.

https://github.com/Bioconductor/LoomExperiment/blob/23899d0a13839fc0c689a6913fb401c46c154347/R/export-method.R#L226-L235

Condition in Line 226 will always evaluate to TRUE when the object is a SingleCellLoomExperiment, therefore exporting only rowRanges to /row_attrs.

As SingleCellExperiment is RangedSummarizedExperiment, it makes sense for SingleCellLoomExperiment to be RSE too, but I guess we might want to export rowData to /row_attrs as well? Though this would have implications for importing. Alternatively, if packages dealing with SCE don't write much in rowRanges, perhaps we can export only rowData to /row_attrs, whether it's a RSE or not.

dvantwisk commented 5 years ago

Thanks for the issue.

I've pushed changes that address this issue. What they do is prioritize gleaning rownames and rowData names from rowData information as opposed to rowRanges information when rowRanges are not present.

The changes have been pushed to devel and should propagate within a day or so.

nh3 commented 5 years ago

Thanks for the fix @dvantwisk! Is it possible to revert the dependency to R >=3.5.0? I think the changes are not dependent on the new R version, right? Doing so would free us from having to update all packages dependent on R version. I'll be happy to update the bioconda recipe for LoomExperiment when that's done. Many thanks!

dvantwisk commented 5 years ago

I've changed the version back to R >= 3.5 on devel. I'll close this issue for now but message me back if any other pop up.

nh3 commented 5 years ago

Hi @dvantwisk, thank you again for resolving all these issues! Would you be able to push changes related to #2 and #3 to release as well? This will make it possible to deploy them through bioconda, which only accepts "released" versions. I've tested all those changes (including those related to #4) and they all worked well.