Bioconductor / LoomExperiment

A package to read, write, and manipulate loom files using LoomExperiments. Uses the loom file format from the Linnarson Lab. https://linnarssonlab.org/loompy/
https://www.bioconductor.org/packages/LoomExperiment
6 stars 5 forks source link

Is sparse matrix supported for exporting to loom file? #2

Closed nh3 closed 5 years ago

nh3 commented 5 years ago

The example in the vignette seems to be based on dense matrix only.

This is what I did:

> sce <- read10xCounts('test_data', col.names=TRUE)
> scle <- SingleCellLoomExperiment(sce)
> export(scle, 'test.loom')
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘.exportLoom’ for signature ‘"dgCMatrix"’

export() works on the same data if converted to dense matrix:

> assay(scle) <- as.matrix(assay(scle))
> export(scle, 'test_dense.loom')
Warning message:
In h5createDataset(h5loc, name, dim, storage.mode = storage.mode(obj),  :
  You created a large dataset with compression and chunking. The chunk size is equal to the dataset dimensions. If you want to read subsets of the dataset, you should test smaller chunk sizes to improve read times. Turn off this warning with showWarnings=FALSE.

As loom uses a dense representation on disk, perhaps it's not too unreasonable to ask for dense matrices. But loompy does support taking sparse matrices and writing dense to disk. Having to do sparse -> dense conversion in memory is quite limiting.

dvantwisk commented 5 years ago

Thank you for the issue. We are looking into a way to efficiently write Matrix::SparseMatrix objects as dense matrices in the loomfile.

dvantwisk commented 5 years ago

I've pushed a change that I hope addresses your issue. I've added functionality for dgCMatrix in export as well as a few other changes to facilitate its usage. I couldn't use your example since I don't know where test_data comes from, so I've used an examples from DropletUtils.

library(LoomExperiment)
library(DropletUtils)

example(read10xCounts)
scle <- SingleCellLoomExperiment(sce10x2)

temp <- tempfile(fileext = ".loom")
export(scle, temp)
res <- import(temp)

The assay will be given as a HDF5Array::DelayedMatrix in the res object. In export, writeHDF5Array() is used on the dgCMatrix. writeHDF5Array() uses block processing i.e. it expands the sparse matrix block by block before writing each block to the HDF5 file, so this solution should write the dgCMatrix efficiently. Message me back is this solution works.