chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
84 stars 20 forks source link

NMF census model appears to not be available on the R side #1299

Open hlmtran opened 1 week ago

hlmtran commented 1 week ago

Describe the bug

I am just going through the R vignettes for using census embeddings, and the cell embeddings aren't recognized with the Warning: The following cell embedding does not exist: nmf.

To Reproduce

library("cellxgene.census")
# library("tiledbsoma")
Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
census <- open_soma(census_version = "2023-12-15")

seurat_obj <- get_seurat(
  census,
  organism = "homo_sapiens",
  obs_value_filter = "tissue_general == 'central nervous system'",
  obsm_layers = c("nmf")
)

Expected behavior

It is my first time using this, so I don't know what I should be expecting in the seurat object, but the Python API has no problem getting the embeddings with the anndata object containing obsm and varm for the cell and feature embeddings respectively based on input parameters.

Environment

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: AlmaLinux 9.4 (Seafoam Ocelot)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=C.utf8       LC_NUMERIC=C          LC_TIME=C.utf8        LC_COLLATE=C.utf8     LC_MONETARY=C.utf8    LC_MESSAGES=C.utf8   
 [7] LC_PAPER=C.utf8       LC_NAME=C             LC_ADDRESS=C          LC_TELEPHONE=C        LC_MEASUREMENT=C.utf8 LC_IDENTIFICATION=C  

time zone: UTC
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_5.1.0            SeuratObject_5.0.2      sp_2.1-4                RcppSpdlog_0.0.18       cellxgene.census_1.16.1

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22            splines_4.4.0               later_1.3.2                 aws.s3_0.3.21               urltools_1.7.3             
  [6] tibble_3.2.1                triebeard_0.4.1             polyclip_1.10-7             fastDummies_1.7.4           lifecycle_1.0.4            
 [11] globals_0.16.3              lattice_0.22-6              MASS_7.3-61                 magrittr_2.0.3              limma_3.60.6               
 [16] plotly_4.10.4               rmarkdown_2.28              yaml_2.3.10                 httpuv_1.6.15               tiledbsoma_1.14.3          
 [21] sctransform_0.4.1           spam_2.11-0                 spatstat.sparse_3.1-0       reticulate_1.39.0           cowplot_1.1.3              
 [26] pbapply_1.7-2               DBI_1.2.3                   RColorBrewer_1.1-3          abind_1.4-8                 zlibbioc_1.50.0            
 [31] Rtsne_0.17                  GenomicRanges_1.56.1        purrr_1.0.2                 BiocGenerics_0.50.0         msigdbr_7.5.1              
 [36] GenomeInfoDbData_1.2.12     IRanges_2.38.1              S4Vectors_0.42.1            ggrepel_0.9.6               irlba_2.3.5.1              
 [41] listenv_0.9.1               spatstat.utils_3.1-0        goftest_1.2-3               RSpectra_0.16-2             spatstat.random_3.3-2      
 [46] fitdistrplus_1.2-1          parallelly_1.38.0           leiden_0.4.3.1              codetools_0.2-20            DelayedArray_0.30.1        
 [51] xml2_1.3.6                  tidyselect_1.2.1            UCSC.utils_1.0.0            farver_2.1.2                matrixStats_1.4.1          
 [56] stats4_4.4.0                base64enc_0.1-3             spatstat.explore_3.3-2      jsonlite_1.8.9              progressr_0.14.0           
 [61] ggridges_0.5.6              survival_3.7-0              tools_4.4.0                 tiledb_0.30.2               ica_1.0-3                  
 [66] Rcpp_1.0.13                 glue_1.8.0                  spdl_0.0.5                  gridExtra_2.3               SparseArray_1.4.8          
 [71] xfun_0.48                   MatrixGenerics_1.16.0       GenomeInfoDb_1.40.1         dplyr_1.1.4                 fastmap_1.2.0              
 [76] fansi_1.0.6                 RcppML_0.5.6                digest_0.6.37               R6_2.5.1                    mime_0.12                  
 [81] colorspace_2.1-1            scattermore_1.2             tensor_1.5                  RcppCCTZ_0.2.12             spatstat.data_3.1-2        
 [86] RSQLite_2.3.7               utf8_1.2.4                  tidyr_1.3.1                 generics_0.1.3              data.table_1.16.0          
 [91] httr_1.4.7                  htmlwidgets_1.6.4           S4Arrays_1.4.1              uwot_0.2.2                  pkgconfig_2.0.3            
 [96] gtable_0.3.5                blob_1.2.4                  lmtest_0.9-40               SingleCellExperiment_1.26.0 XVector_0.44.0             
[101] htmltools_0.5.8.1           dotCall64_1.2               fgsea_1.31.3                scales_1.3.0                Biobase_2.64.0             
[106] png_0.1-8                   spatstat.univar_3.0-1       knitr_1.48                  rstudioapi_0.16.0           reshape2_1.4.4             
[111] nlme_3.1-166                curl_5.2.3                  zoo_1.8-12                  cachem_1.1.0                stringr_1.5.1              
[116] KernSmooth_2.23-24          parallel_4.4.0              miniUI_0.1.1.1              arrow_17.0.0.1              AnnotationDbi_1.66.0       
[121] nanotime_0.3.10             pillar_1.9.0                grid_4.4.0                  vctrs_0.6.5                 nanoarrow_0.5.0.1          
[126] RANN_2.6.2                  promises_1.3.0              xtable_1.8-4                cluster_2.1.6               evaluate_1.0.0             
[131] cli_3.6.3                   singlet_0.99.6              compiler_4.4.0              rlang_1.1.4                 crayon_1.5.3               
[136] future.apply_1.11.2         aws.signature_0.6.0         fs_1.6.4                    plyr_1.8.9                  stringi_1.8.4              
[141] viridisLite_0.4.2           deldir_2.0-4                BiocParallel_1.38.0         assertthat_0.2.1            babelgene_22.9             
[146] munsell_0.5.1               Biostrings_2.72.1           lazyeval_0.2.2              spatstat.geom_3.3-3         Matrix_1.7-0               
[151] RcppHNSW_0.6.0              patchwork_1.3.0             bit64_4.5.2                 future_1.34.0               ggplot2_3.5.1              
[156] KEGGREST_1.44.1             statmod_1.5.0               shiny_1.9.1                 SummarizedExperiment_1.34.0 ROCR_1.0-11                
[161] igraph_2.0.3                memoise_2.0.1               fastmatch_1.1-4             bit_4.5.0          

Additional context

The Python API seems to also include the var_embeddings parameter which there seems to be no equivalent to in R (something like var_layers). Some guidance would be greatly appreciated as I also need the feature embeddings for this model. It would also be extremely useful to be able to check how these census models are named, and which ones are available; I am going off of the Python code with the assumption that nmf is consistent in R.

MaximilianLombardo commented 4 days ago

Thank you for reporting this issue and providing the detailed reproduction steps. The NMF embeddings you’re attempting to access are not currently supported in the R API for CELLxGENE Census. At present, the only cell embeddings available via the R API are from scvi and geneformer. NMF embeddings have not been implemented for R and we do not currently have plans to implement support for this.