Bioconductor / LoomExperiment

A package to read, write, and manipulate loom files using LoomExperiments. Uses the loom file format from the Linnarson Lab. https://linnarssonlab.org/loompy/
https://www.bioconductor.org/packages/LoomExperiment
6 stars 5 forks source link

zero-length inputs cannot be mixed with those of non-zero length #14

Open ccshao opened 4 years ago

ccshao commented 4 years ago

Thanks for the effort, I have a few questions about the usages. Following are my codes on converting a count matrix from seurat object to loom file

library(LoomExperiment)
library(Seurat)

#- read the data 
sobj   <- qs::qread("sobj.q")
mat    <- as.matrix(sobj[["RNA"]]@counts)
t1     <- sample(17297, 5000)
t2     <- sample(13572, 2000)
submat <- mat[t1, t2]

subscle <- SingleCellLoomExperiment(assays = list(counts = submat))
export(subscle, "test.loom", rownames_attr = "Gene", colnames_attr = "CellID")

sparse matrix The counts is a sparse matrix, should it be converted to matrix before used as the input for SingleCellLoomExperiment?

chunk size There is an message from export,

export(subscle, "test.loom", rownames_attr = "Gene", colnames_attr = "CellID") You created a large dataset with compression and chunking. The chunk size is equal to the dataset dimensions. If you want to read subsets of the dataset, you should testsmaller chunk sizes to improve read times. Warning message: In value[3L] : zero-length inputs cannot be mixed with those of non-zero length

Should I play wit the chunk size, and how?

zero-length warning Additionally it saide "zero-length inputs cannot be mixed with those of non-zero length", Is there something wrong with the export command?

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 19.2

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] LoomExperiment_1.4.0 rtracklayer_1.46.0
[3] rhdf5_2.30.1 SingleCellExperiment_1.8.0 [5] SummarizedExperiment_1.16.1 DelayedArray_0.12.2
[7] BiocParallel_1.20.1 matrixStats_0.55.0
[9] Biobase_2.46.0 GenomicRanges_1.38.0
[11] GenomeInfoDb_1.22.0 IRanges_2.20.2
[13] S4Vectors_0.24.3 BiocGenerics_0.32.0
[15] Seurat_3.1.1

loaded via a namespace (and not attached): [1] TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1
[4] ggridges_0.5.2 XVector_0.26.0 leiden_0.3.3
[7] listenv_0.8.0 npsurv_0.4-0 ggrepel_0.8.2
[10] mvtnorm_1.1-0 codetools_0.2-16 splines_3.6.3
[13] R.methodsS3_1.8.0 mnormt_1.5-6 lsei_1.2-0
[16] TFisher_0.2.0 jsonlite_1.6.1 Rsamtools_2.2.3
[19] ica_1.0-2 cluster_2.1.0 png_0.1-7
[22] R.oo_1.23.0 uwot_0.1.5 HDF5Array_1.14.3
[25] sctransform_0.2.1 compiler_3.6.3 httr_1.4.1
[28] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
[31] htmltools_0.4.0 tools_3.6.3 rsvd_1.0.3
[34] igraph_1.2.4.2 GenomeInfoDbData_1.2.2 gtable_0.3.0
[37] glue_1.3.2 RANN_2.6.1 reshape2_1.4.3
[40] dplyr_0.8.5 rappdirs_0.3.1 Rcpp_1.0.3
[43] Biostrings_2.54.0 vctrs_0.2.4 multtest_2.42.0
[46] gdata_2.18.0 ape_5.3 nlme_3.1-144
[49] gbRd_0.4-11 lmtest_0.9-37 stringr_1.4.0
[52] globals_0.12.5 lifecycle_0.2.0 irlba_2.3.3
[55] gtools_3.8.1 XML_3.99-0.3 future_1.16.0
[58] zlibbioc_1.32.0 MASS_7.3-51.5 zoo_1.8-7
[61] scales_1.1.0 sandwich_2.5-1 RColorBrewer_1.1-2
[64] qs_0.21.2 reticulate_1.14 pbapply_1.4-2
[67] gridExtra_2.3 ggplot2_3.3.0 stringi_1.4.6
[70] mutoss_0.1-12 plotrix_3.7-7 caTools_1.18.0
[73] bibtex_0.4.2.2 Rdpack_0.11-1 SDMTools_1.1-221.2
[76] rlang_0.4.5 pkgconfig_2.0.3 bitops_1.0-6
[79] lattice_0.20-40 Rhdf5lib_1.8.0 ROCR_1.0-7
[82] purrr_0.3.3 GenomicAlignments_1.22.1 htmlwidgets_1.5.1
[85] cowplot_1.0.0 tidyselect_1.0.0 RcppAnnoy_0.0.16
[88] plyr_1.8.6 magrittr_1.5 R6_2.4.1
[91] gplots_3.0.3 multcomp_1.4-12 pillar_1.4.3
[94] sn_1.5-5 fitdistrplus_1.0-14 survival_3.1-8
[97] RCurl_1.98-1.1 tibble_2.1.3 future.apply_1.4.0
[100] tsne_0.1-3 crayon_1.3.4 KernSmooth_2.23-16
[103] RApiSerialize_0.1.0 plotly_4.9.2 grid_3.6.3
[106] data.table_1.12.8 metap_1.3 digest_0.6.25
[109] tidyr_1.0.2 numDeriv_2016.8-1.1 R.utils_2.9.2
[112] RcppParallel_5.0.0 munsell_0.5.0 viridisLite_0.3.0

dvantwisk commented 4 years ago

Thank you for the issue. For debugging purposes, it would be useful to have the sobj.q file so I can run the code. Can you provide this file?

ccshao commented 4 years ago

@dvantwisk Thanks for your interest. I installed R 4.0.2, and latest LoomExperiment. The issue persists. Here are codes I used, the submat.rds, a genes by cells matrix, is attached.


library(LoomExperiment)
sobj   <- qs::qread("sobj.q")
mat    <- as.matrix(sobj[["RNA"]]@counts)
t1     <- sample(17297, 5000)
t2     <- sample(13572, 2000)
submat <- mat[t1, t2]
saveRDS(submat, "submat.rds")

subscle <- SingleCellLoomExperiment(assays = list(counts = submat))
export(subscle, "test.loom", rownames_attr = "Gene", colnames_attr = "CellID")

You created a large dataset with compression and chunking.
The chunk size is equal to the dataset dimensions.
If you want to read subsets of the dataset, you should testsmaller chunk sizes to improve read times.
Warning message:
In value[[3L]](cond) :
  zero-length inputs cannot be mixed with those of non-zero length

test.loom.zip

dvantwisk commented 4 years ago

I apologize for the slow pace on this.

The initial message is new is being thrown by the underlying rhdf5 code, so I will get to removing this does not apply to LoomExperiment. I'm unable to replicate the warning you are getting. It is a rather difficult message to diagnose as it occurs in the base R code and could be tripped at many points. I would think that it may be because one of the inputs, be it the CellID or Gene that is contains a zero length element that is causing this issue at some point in the code. If I could get access to the sobj.q and submat.rds files, I could better work to solve this issue. I'm not sure if you meant to give me these or not from the beginning.

ccshao commented 4 years ago

@dvantwisk Sorry I forgot to attach the submat.rds. Actually this warning shows in the export step with example codes, at least on my side.

library(LoomExperiment)
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
scle <- SingleCellLoomExperiment(assays = list(counts = counts))
export(scle, "test2.loom", rownames_attr = "Gene", colnames_attr = "CellID")

Warning message: In value[3L] : zero-length inputs cannot be mixed with those of non-zero length

export(scle, "test3.loom")

Warning message: In value[3L] : zero-length inputs cannot be mixed with those of non-zero length

sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS High Sierra 10.13.6

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] LoomExperiment_1.6.0 rtracklayer_1.47.0 [3] rhdf5_2.32.2 SingleCellExperiment_1.10.1 [5] SummarizedExperiment_1.18.2 DelayedArray_0.14.0 [7] matrixStats_0.56.0 Biobase_2.48.0 [9] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 [11] IRanges_2.22.2 S4Vectors_0.26.1 [13] BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] magrittr_1.5 XVector_0.28.0 GenomicAlignments_1.24.0 [4] zlibbioc_1.34.0 BiocParallel_1.22.0 lattice_0.20-41 [7] stringr_1.4.0 tools_4.0.2 grid_4.0.2 [10] HDF5Array_1.16.1 crayon_1.3.4 Matrix_1.2-18 [13] GenomeInfoDbData_1.2.3 Rhdf5lib_1.10.1 bitops_1.0-6 [16] RCurl_1.98-1.2 stringi_1.4.6 compiler_4.0.2 [19] Biostrings_2.56.0 Rsamtools_2.4.0 XML_3.99-0.4