dynverse / anndata

Annotated multivariate observation data in R
https://anndata.dynverse.org
Other
43 stars 4 forks source link

Error when accessing `X` after loading as HDF5-backed #1

Closed milanmlft closed 1 week ago

milanmlft commented 3 years ago

Hi Robrecht, I encountered this problem when trying to load one of the example datasets using read_h5ad(..., backed = "r"):

library(anndata)

example <- "https://github.com/rcannood/anndata/raw/master/example_formats/pbmc_1k_protein_v3_processed.h5ad"
download.file(example, destfile = "example.h5ad", quiet = TRUE)
ad <- read_h5ad("example.h5ad", backed = "r")
ad
#> AnnData object with n_obs × n_vars = 713 × 33538 backed at 'example.h5ad'
#>     var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'hvgParameters', 'normalizationParameters', 'pca', 'pcaParameters'
#>     obsm: 'X_pca'
#>     varm: 'PCs'
ad$X
#> Error in dimnames(out) <- dimnames(self): 'dimnames' applied to non-array

Created on 2021-02-08 by the reprex package (v1.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 Patched (2021-01-27 r79891) #> os macOS Catalina 10.15.7 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Brussels #> date 2021-02-08 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> anndata * 0.7.5.1 2021-02-02 [1] CRAN (R 4.0.2) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.2) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] standard (@1.7.2) #> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.3) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> Matrix 1.3-2 2021-01-06 [2] CRAN (R 4.0.3) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R.cache 0.14.0 2019-12-06 [1] CRAN (R 4.0.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2) #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) #> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.3) #> reticulate 1.18 2020-10-25 [1] CRAN (R 4.0.2) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.6 2020-12-14 [1] standard (@2.6) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.0) #> tibble 3.0.5 2021-01-15 [1] CRAN (R 4.0.2) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3) #> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.3) #> xfun 0.20 2021-01-06 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /Users/milan/Library/R/4.0/library #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```

Using backed = NULL does work and then it loads normally as a sparse matrix.

rcannood commented 3 years ago

Thanks for bringing this up. I need to implement an py_to_r() function for a SparseDataset object. I'll try to fix this by Friday.

milanmlft commented 3 years ago

FYI, zellkonverter has a similar issue: theislab/zellkonverter#37 I think they're working on a H5ADMatrix class in https://github.com/Bioconductor/HDF5Array to support this. Maybe that's also useful for anndata?

rcannood commented 3 years ago

Thanks for the tip. I can't use HDF5Array because I'm accessing everything through reticulate & anndata. I'm tracking progress on this feature on branch feature/backed_matrices :)

nicolasstransky commented 3 years ago

Hello @rcannood , were you able to find a workaround? Sparse matrices are fairly commonplace in anndata objects, therefore this capability would be much appreciated! Thanks

rcannood commented 3 years ago

Hey @nicolasstransky !

Sparse matrices are not actually an issue. They just can't be loaded as a HDF5 backed matrix, so you'll have to load it in memory for now.

I intend to work on this feature, but haven't had the time for it yet.

bschilder commented 3 years ago

Also experiencing this issue.

canergen commented 2 years ago

Hey @rcannood, My older workaround with backed=FALSE is not working anymore. Even with backed=FALSE it doesn't load the data frame correctly. I suspect it's some version mismatch so below is the output of sessionInfo(). The issue is reproducible for me using pbmc3k data.

scanpy.datasets.pbmc3k()

R version 4.1.3 (2022-03-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 16.04.6 LTS

Matrix products: default BLAS/LAPACK: libopenblasp-r0.3.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] anndata_0.7.5.3 VISION_3.0.0 reticulate_1.22

loaded via a namespace (and not attached): [1] mclust_5.4.10 Rcpp_1.0.9 rsvd_1.0.5
[4] ape_5.6-2 lattice_0.20-45 png_0.1-7
[7] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2
[10] wordspace_0.2-7 mime_0.12 IRdisplay_1.1
[13] R6_2.5.1 repr_1.1.4 phytools_1.0-3
[16] sparsesvd_0.2 evaluate_0.15 coda_0.19-4
[19] pillar_1.8.0 rlang_1.0.3 uuid_1.1-0
[22] loe_1.1 irlba_2.3.5 vegan_2.6-2
[25] phangorn_2.9.0 Matrix_1.4-1 combinat_0.0-8
[28] splines_4.1.3 webutils_1.1 Rtsne_0.16
[31] igraph_1.3.4 compiler_4.1.3 numDeriv_2016.8-1.1
[34] pkgconfig_2.0.3 base64enc_0.1-3 mnormt_2.1.0
[37] mgcv_1.8-40 htmltools_0.5.3 expm_0.999-6
[40] RANN_2.6.1 quadprog_1.5-8 logging_0.10-108
[43] codetools_0.2-18 matrixStats_0.62.0 permute_0.9-7
[46] fansi_1.0.3 crayon_1.5.1 later_1.3.0
[49] MASS_7.3-58 grid_4.1.3 nlme_3.1-158
[52] jsonlite_1.8.0 lifecycle_1.0.1 magrittr_2.0.3
[55] cli_3.3.0 stringi_1.7.8 plumber_1.2.0
[58] swagger_3.33.1 promises_1.2.0.1 scatterplot3d_0.3-41
[61] vctrs_0.4.1 generics_0.1.3 fastmatch_1.1-3
[64] IRkernel_1.3 pbmcapply_1.5.1 fastICA_1.2-3
[67] iotools_0.3-2 tools_4.1.3 glue_1.6.2
[70] maps_3.4.0 plotrix_3.8-2 parallel_4.1.3
[73] fastmap_1.1.0 cluster_2.1.3 pbdZMQ_0.3-7
[76] clusterGeneration_1.3.7

R version 4.1.3 (2022-03-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 16.04.6 LTS

Matrix products: default BLAS/LAPACK: /home/eecs/cergen/anaconda3/envs/reticulate/lib/libopenblasp-r0.3.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] anndata_0.7.5.3 VISION_3.0.0 reticulate_1.22

loaded via a namespace (and not attached): [1] mclust_5.4.10 Rcpp_1.0.9 rsvd_1.0.5
[4] ape_5.6-2 lattice_0.20-45 png_0.1-7
[7] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2
[10] wordspace_0.2-7 mime_0.12 IRdisplay_1.1
[13] R6_2.5.1 repr_1.1.4 phytools_1.0-3
[16] sparsesvd_0.2 evaluate_0.15 coda_0.19-4
[19] pillar_1.8.0 rlang_1.0.3 uuid_1.1-0
[22] loe_1.1 irlba_2.3.5 vegan_2.6-2
[25] phangorn_2.9.0 Matrix_1.4-1 combinat_0.0-8
[28] splines_4.1.3 webutils_1.1 Rtsne_0.16
[31] igraph_1.3.4 compiler_4.1.3 numDeriv_2016.8-1.1
[34] pkgconfig_2.0.3 base64enc_0.1-3 mnormt_2.1.0
[37] mgcv_1.8-40 htmltools_0.5.3 expm_0.999-6
[40] RANN_2.6.1 quadprog_1.5-8 logging_0.10-108
[43] codetools_0.2-18 matrixStats_0.62.0 permute_0.9-7
[46] fansi_1.0.3 crayon_1.5.1 later_1.3.0
[49] MASS_7.3-58 grid_4.1.3 nlme_3.1-158
[52] jsonlite_1.8.0 lifecycle_1.0.1 magrittr_2.0.3
[55] cli_3.3.0 stringi_1.7.8 plumber_1.2.0
[58] swagger_3.33.1 promises_1.2.0.1 scatterplot3d_0.3-41
[61] vctrs_0.4.1 generics_0.1.3 fastmatch_1.1-3
[64] IRkernel_1.3 pbmcapply_1.5.1 fastICA_1.2-3
[67] iotools_0.3-2 tools_4.1.3 glue_1.6.2
[70] maps_3.4.0 plotrix_3.8-2 parallel_4.1.3
[73] fastmap_1.1.0 cluster_2.1.3 pbdZMQ_0.3-7
[76] clusterGeneration_1.3.7

rcannood commented 1 week ago

Fixed by #34 !