Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

the problem "the supplied seed must support extract_array() " #117

Closed tomyputw closed 1 month ago

tomyputw commented 1 month ago

I actually don t know why I come across this problem when I test fastMNN. I hope I could find out why this error happen, and code is as follows.

zilionis <- ZilionisLungData() zilionis <- zilionis[, colSums(assay(zilionis)) != 0] dim(zilionis) bpp <- BiocParallel::MulticoreParam(20) zilionis <- addPerCellQC(zilionis, BPPARAM=bpp, subsets=list(Mito=which(rownames(zilionis)=="^MT-"))) zilionis <- logNormCounts(zilionis, size_factors = zilionis$sum) set.seed(1010010101) dec.zilionis <- modelGeneVarByPoisson(zilionis, block=zilionis$Library, BPPARAM=bpp) top.zilionis <- getTopHVGs(dec.zilionis, n=5000) library(BiocNeighbors) library(batchelor) set.seed(1010001) merged.zilionis <- fastMNN(zilionis, batch = zilionis$Library, subset.row = top.zilionis, BSPARAM=BiocSingular::RandomParam(deferred = TRUE), BNPARAM=AnnoyParam(), BPPARAM=bpp)

And this is the traceback.

traceback() 19: h(simpleError(msg, call)) 18: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "类别为\"ScaledMatrix\"的对象不对: \n the supplied seed must support extract_array()", base::quote(validObject(.Object))) 17: stop(msg, ": ", errors, domain = NA) 16: validObject(.Object) 15: initialize(value, ...) 14: initialize(value, ...) 13: new(...) 12: new2(Class, seed = seed) 11: new_DelayedArray(seed, Class = "ScaledMatrix") 10: DelayedArray(ScaledMatrixSeed(x, center = center, scale = scale)) 9: DelayedArray(ScaledMatrixSeed(x, center = center, scale = scale)) 8: ScaledMatrix(t(x), center = centers) 7: t(ScaledMatrix(t(x), center = centers)) 6: .process_single_matrix_for_pca(mat, batch = batch, weights = weights, subset.row = keep, deferred = bsdeferred(BSPARAM)) 5: FUN(subset.row) 4: .multi_pca_single(x, batch = batch, d = d, weights = weights, get.variance = get.variance, subset.row = subset.row, get.all.genes = correct.all, BSPARAM = BSPARAM, BPPARAM = BPPARAM) 3: (function (x, batch, restrict = NULL, ..., subset.row = NULL, cos.norm = TRUE, d = 50, weights = NULL, get.variance = FALSE, correct.all = FALSE, BSPARAM = ExactParam(), BPPARAM = SerialParam()) { batch <- factor(batch) .check_valid_batch(x, batch) if (cos.norm) { l2 <- cosineNorm(x, mode = "l2norm", subset.row = subset.row, BPPARAM = BPPARAM) x <- .apply_cosine_norm(x, l2) } mat <- .multi_pca_single(x, batch = batch, d = d, weights = weights, get.variance = get.variance, subset.row = subset.row, get.all.genes = correct.all, BSPARAM = BSPARAM, BPPARAM = BPPARAM) divided <- divideIntoBatches(mat[[1]], batch = batch, restrict = restrict, byrow = TRUE) output <- .fast_mnn(batches = divided$batches, restrict = divided$restricted, ..., BPPARAM = BPPARAM) d.reo <- divided$reorder output <- output[d.reo, , drop = FALSE] ... 2: do.call(.fast_mnn_single, c(list(x = batches[[1]], batch = batch, restrict = restrict[[1]]), common.args)) 1: fastMNN(zilionis, batch = zilionis$Library, subset.row = top.zilionis, BPPARAM = bpp)

hpages commented 1 month ago

Thanks for the report.

Please:

  1. Provide a self-contained example. This means that the code must work in a fresh R session. In particular it must include any library(package) commands that are needed. Right now I get:
    > zilionis <- ZilionisLungData()
    Error in ZilionisLungData() : could not find function "ZilionisLungData"
  2. Show your sessionInfo().
  3. Make sure that all your packages are up-to-date. You can use BiocManager::valid() for that.
LTLA commented 1 month ago

An abbreviated version of this works fine for me:

library(scRNAseq)
zeisel <- ZeiselBrainData()
zeisel <- zeisel[, colSums(assay(zeisel)) != 0]
dim(zeisel)

library(scran)
bpp <- BiocParallel::MulticoreParam(2)
zeisel <- addPerCellQC(
    zeisel, 
    BPPARAM=bpp,
    subsets=list(Mito=grep("^mt-", rownames(zeisel))) # note, your subsets don't make sense.
)
zeisel <- logNormCounts(zeisel, size_factors = zeisel$sum)

set.seed(1010010101)
dec.zeisel <- modelGeneVarByPoisson(zeisel,
block=zeisel$Library, BPPARAM=bpp)
top.zeisel <- getTopHVGs(dec.zeisel, n=5000)

library(BiocNeighbors)
library(batchelor)
set.seed(1010001)
merged.zeisel <- fastMNN(zeisel, batch = zeisel$tissue, subset.row = top.zeisel,
  BSPARAM=BiocSingular::RandomParam(deferred = TRUE),
  BNPARAM=AnnoyParam(),
  BPPARAM=bpp)

The Zilionis dataset has the same matrix type so there shouldn't be much difference, aside from computational time.

Session information ``` R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Ventura 13.6.7 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] batchelor_1.20.0 BiocNeighbors_1.22.0 [3] scran_1.32.0 scuttle_1.14.0 [5] scRNAseq_2.18.0 SingleCellExperiment_1.26.0 [7] SummarizedExperiment_1.34.0 Biobase_2.64.0 [9] GenomicRanges_1.56.1 GenomeInfoDb_1.40.1 [11] IRanges_2.38.1 S4Vectors_0.42.1 [13] BiocGenerics_0.50.0 MatrixGenerics_1.16.0 [15] matrixStats_1.3.0 loaded via a namespace (and not attached): [1] DBI_1.2.3 bitops_1.0-7 [3] httr2_1.0.1 rlang_1.1.4 [5] magrittr_2.0.3 gypsum_1.0.1 [7] compiler_4.4.1 RSQLite_2.3.7 [9] GenomicFeatures_1.56.0 DelayedMatrixStats_1.26.0 [11] png_0.1-8 vctrs_0.6.5 [13] ProtGenerics_1.36.0 pkgconfig_2.0.3 [15] crayon_1.5.3 fastmap_1.2.0 [17] dbplyr_2.5.0 XVector_0.44.0 [19] utf8_1.2.4 Rsamtools_2.20.0 [21] UCSC.utils_1.0.0 bit_4.0.5 [23] bluster_1.14.0 zlibbioc_1.50.0 [25] cachem_1.1.0 beachmat_2.20.0 [27] jsonlite_1.8.8 blob_1.2.4 [29] rhdf5filters_1.16.0 DelayedArray_0.30.1 [31] Rhdf5lib_1.26.0 BiocParallel_1.38.0 [33] irlba_2.3.5.1 parallel_4.4.1 [35] cluster_2.1.6 R6_2.5.1 [37] limma_3.60.3 rtracklayer_1.64.0 [39] Rcpp_1.0.12 igraph_2.0.3 [41] Matrix_1.7-0 tidyselect_1.2.1 [43] abind_1.4-5 yaml_2.3.9 [45] codetools_0.2-20 curl_5.2.1 [47] lattice_0.22-6 alabaster.sce_1.4.0 [49] tibble_3.2.1 KEGGREST_1.44.1 [51] BiocFileCache_2.12.0 alabaster.schemas_1.4.0 [53] ExperimentHub_2.12.0 Biostrings_2.72.1 [55] pillar_1.9.0 BiocManager_1.30.23 [57] filelock_1.0.3 generics_0.1.3 [59] RCurl_1.98-1.14 BiocVersion_3.19.1 [61] ensembldb_2.28.0 sparseMatrixStats_1.16.0 [63] alabaster.base_1.5.3 glue_1.7.0 [65] alabaster.ranges_1.4.2 metapod_1.12.0 [67] alabaster.matrix_1.4.2 lazyeval_0.2.2 [69] tools_4.4.1 AnnotationHub_3.12.0 [71] BiocIO_1.14.0 ScaledMatrix_1.12.0 [73] locfit_1.5-9.10 GenomicAlignments_1.40.0 [75] XML_3.99-0.17 rhdf5_2.48.0 [77] grid_4.4.1 edgeR_4.2.0 [79] AnnotationDbi_1.66.0 GenomeInfoDbData_1.2.12 [81] BiocSingular_1.20.0 HDF5Array_1.32.0 [83] restfulr_0.0.15 cli_3.6.3 [85] rsvd_1.0.5 rappdirs_0.3.3 [87] fansi_1.0.6 S4Arrays_1.4.1 [89] dplyr_1.1.4 ResidualMatrix_1.14.1 [91] AnnotationFilter_1.28.0 alabaster.se_1.4.1 [93] dqrng_0.4.1 SparseArray_1.4.8 [95] rjson_0.2.21 memoise_2.0.1 [97] lifecycle_1.0.4 httr_1.4.7 [99] statmod_1.5.0 bit64_4.0.5 ```
tomyputw commented 1 month ago

@LTLA Thank you for your reply very much. I retry in linux and it work well.