cellgeni / sceasy

A package to help convert different single-cell data formats to each other
GNU General Public License v3.0
363 stars 53 forks source link

Issue converting Seurat obj to h5ad Anndata #26

Closed akhst7 closed 3 years ago

akhst7 commented 3 years ago

I get an following error after firing this command, convertFormat(pbmc10k, from = "seurat", to="anndata", outFile = "~/Desktop/pbmc.h5ad") ;

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  AttributeError: 'module' object has no attribute '__import__'
In addition: Warning message:
In .regularise_df(obj@meta.data, drop_single_values = drop_single_values) :

and traceback is ;

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  AttributeError: 'module' object has no attribute '__import__' 
13. stop(structure(list(message = "AttributeError: 'module' object has no attribute '__import__'", 
    call = py_call_impl(callable, dots$args, dots$keywords), 
    cppstack = structure(list(file = "", line = -1L, stack = c("1   reticulate.so                       0x000000013b8f565e _ZN4Rcpp9exceptionC2EPKcb + 222", 
    "2   reticulate.so                       0x000000013b8fd735 _ZN4Rcpp4stopERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE + 53",  ... 
12. initialize at loader.py#13
11. loader$initialize(py_module_loaded) 
10. py_inject_hooks() 
9. ensure_python_initialized() 
8. reticulate::import_builtins(convert = FALSE) 
7. .rs.reticulate.onPythonInitialized() 
6. callback() 
5. initialize_python(required_module, use_environment) 
4. ensure_python_initialized(required_module = module) 
3. reticulate::import("anndata", convert = FALSE) 
2. func(obj, outFile = outFile, main_layer = main_layer, ...) 
1. convertFormat(pbmc10k, from = "seurat", to = "anndata", outFile = "~/Desktop/pbmc.h5ad") 

I am not exactly sure what a "AttributeError: 'module' object has no attribute 'import'" means and evidently, the issue may lie in the Seurat obj in question but I am not sure how to approach it. Any help will be appreciated.

> pbmc10k
An object of class Seurat 
130469 features across 10194 samples within 4 assays 
Active assay: RNA (36601 features, 2000 variable features)
 3 other assays present: unspliced, spliced, SCT
 2 dimensional reductions calculated: pca, umap

> DefaultAssay(pbmc10k)
[1] "RNA"
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SeuratDisk_0.0.0.9017       SeuratObject_4.0.0          Seurat_4.0.0               
 [4] sceasy_0.0.6                reticulate_1.18             remotes_2.2.0              
 [7] BiocManager_1.30.10         slingshot_1.9.1             princurve_2.1.6            
[10] TFBSTools_1.28.0            monocle3_0.2.3.0            SingleCellExperiment_1.12.0
[13] SummarizedExperiment_1.20.0 GenomicRanges_1.42.0        GenomeInfoDb_1.26.2        
[16] IRanges_2.24.1              S4Vectors_0.28.1            MatrixGenerics_1.2.1       
[19] matrixStats_0.58.0          Biobase_2.50.0              BiocGenerics_0.36.0        

loaded via a namespace (and not attached):
  [1] utf8_1.1.4                  R.utils_2.10.1              tidyselect_1.1.0           
  [4] poweRlaw_0.70.6             RSQLite_2.2.3               AnnotationDbi_1.52.0       
  [7] htmlwidgets_1.5.3           grid_4.0.3                  BiocParallel_1.24.1        
 [10] Rtsne_0.15                  munsell_0.5.0               codetools_0.2-18           
 [13] ica_1.0-2                   future_1.21.0               miniUI_0.1.1.1             
 [16] withr_2.4.1                 colorspace_2.0-0            rstudioapi_0.13            
 [19] ROCR_1.0-11                 tensor_1.5                  listenv_0.8.0              
 [22] GenomeInfoDbData_1.2.4      polyclip_1.10-0             bit64_4.0.5                
 [25] rprojroot_2.0.2             parallelly_1.23.0           vctrs_0.3.6                
 [28] generics_0.1.0              xfun_0.21                   R6_2.5.0                   
 [31] hdf5r_1.3.3                 bitops_1.0-6                spatstat.utils_2.0-0       
 [34] cachem_1.0.4                DelayedArray_0.16.1         assertthat_0.2.1           
 [37] promises_1.2.0.1            scales_1.1.1                gtable_0.3.0               
 [40] globals_0.14.0              processx_3.4.5              goftest_1.2-2              
 [43] seqLogo_1.56.0              rlang_0.4.10                splines_4.0.3              
 [46] rtracklayer_1.50.0          lazyeval_0.2.2              reshape2_1.4.4             
 [49] abind_1.4-5                 httpuv_1.5.5                tools_4.0.3                
 [52] ggplot2_3.3.3               ellipsis_0.3.1              RColorBrewer_1.1-2         
 [55] ggridges_0.5.3              Rcpp_1.0.6                  plyr_1.8.6                 
 [58] zlibbioc_1.36.0             purrr_0.3.4                 RCurl_1.98-1.2             
 [61] ps_1.5.0                    prettyunits_1.1.1           rpart_4.1-15               
 [64] deldir_0.2-10               pbapply_1.4-3               viridis_0.5.1              
 [67] cowplot_1.1.1               zoo_1.8-8                   ggrepel_0.9.1              
 [70] cluster_2.1.1               tinytex_0.29                magrittr_2.0.1             
 [73] data.table_1.14.0           scattermore_0.7             lmtest_0.9-38              
 [76] RANN_2.6.1                  fitdistrplus_1.1-3          hms_1.0.0                  
 [79] patchwork_1.1.1             mime_0.10                   xtable_1.8-4               
 [82] XML_3.99-0.5                gridExtra_2.3               compiler_4.0.3             
 [85] tibble_3.1.0                KernSmooth_2.23-18          crayon_1.4.1               
 [88] R.oo_1.24.0                 htmltools_0.5.1.1           mgcv_1.8-34                
 [91] later_1.1.0.1               tidyr_1.1.2                 DBI_1.1.1                  
 [94] MASS_7.3-53.1               rappdirs_0.3.3              Matrix_1.3-2               
 [97] readr_1.4.0                 cli_2.3.1                   R.methodsS3_1.8.1          
[100] igraph_1.2.6                pkgconfig_2.0.3             GenomicAlignments_1.26.0   
[103] TFMPvalue_0.0.8             plotly_4.9.3                annotate_1.68.0            
[106] DirichletMultinomial_1.32.0 XVector_0.30.0              stringr_1.4.0              
[109] callr_3.5.1                 digest_0.6.27               sctransform_0.3.2          
[112] RcppAnnoy_0.0.18            pracma_2.3.3                CNEr_1.26.0                
[115] spatstat.data_2.0-0         Biostrings_2.58.0           leiden_0.3.7               
[118] uwot_0.1.10                 curl_4.3                    shiny_1.6.0                
[121] Rsamtools_2.6.0             gtools_3.8.2                lifecycle_1.0.0            
[124] nlme_3.1-152                jsonlite_1.7.2              viridisLite_0.3.0          
[127] BSgenome_1.58.0             fansi_0.4.2                 pillar_1.5.0               
[130] lattice_0.20-41             KEGGREST_1.30.1             fastmap_1.1.0              
[133] httr_1.4.2                  pkgbuild_1.2.0              survival_3.2-7             
[136] GO.db_3.12.1                glue_1.4.2                  spatstat_1.64-1            
[139] png_0.1-7                   bit_4.0.4                   stringi_1.5.3              
[142] blob_1.2.1                  caTools_1.18.1              memoise_2.0.0              
[145] dplyr_1.0.4                 irlba_2.3.3                 future.apply_1.7.0         
[148] ape_5.4-1                  
>
pip show loompy
Name: loompy
Version: 3.0.6
Summary: Work with Loom files for single-cell RNA-seq data
Home-page: https://github.com/linnarsson-lab/loompy
Author: Linnarsson Lab
Author-email: sten.linnarsson@ki.se
License: BSD
Location: /usr/local/lib/python3.8/site-packages
Requires: numba, numpy, click, h5py, setuptools, numpy-groupies, scipy
Required-by: scvelo
pip show anndata
Name: anndata
Version: 0.7.5
Summary: Annotated Data.
Home-page: http://github.com/theislab/anndata
Author: Philipp Angerer, Alex Wolf, Isaac Virshup, Sergei Rybakov
Author-email: phil.angerer@gmail.com, f.alex.wolf@gmx.de
License: BSD-3-Clause
Location: /usr/local/lib/python3.8/site-packages
Requires: natsort, scipy, pandas, numpy, packaging, h5py
Required-by: scvelo, scanpy
nh3 commented 3 years ago

The error occurred when trying to import anndata. I guess reticulate somehow was not calling the right python installation. You could try load reticulate manually before loading sceasy and specify python version using one of the methods listed here https://rstudio.github.io/reticulate/articles/versions.html#providing-hints.

akhst7 commented 3 years ago

@nh3, it worked but I have an new issue.

convertFormat(pbmc10k, from = "seurat", to="anndata", outFile = "pbmc10k.h5ad")
... storing 'Phase' as categorical
... storing 'hpca.fine' as categorical
... storing 'hpca.main' as categorical
... storing 'monaco.main' as categorical
... storing 'monaco.fine' as categorical
AnnData object with n_obs × n_vars = 10194 × 36601
    obs: 'nCount_RNA', 'nFeature_RNA', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'percent.mt', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'hpca.fine', 'hpca.main', 'monaco.main', 'monaco.fine'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
    obsm: 'X_pca', 'X_umap'
Warning message:
In .regularise_df(obj@meta.data, drop_single_values = drop_single_values) :
  Dropping single category variables:orig.ident
GouQiao commented 2 years ago

Hi ,did you figure out. I also met: In .regularise_df(obj@meta.data, drop_single_values = drop_single_values) : Dropping single category variables:IR_VJ_1_d_call, IR_VJ_2_d_call

nh3 commented 2 years ago

It's just a warning not an error. The resulting object is still valid except for lacking those two columns in obs.

The cause of the warning is that those two variables each contain only a single value, and are therefore dropped as some version of anndata complains about single value categorical variables. If you want to keep them, try passing drop_single_values=FALSE to convertFormat(); or you could add them manually afterwards like

adata.obs["IR_VJ_1_d_call"] = <your_value>