cellgeni / sceasy

A package to help convert different single-cell data formats to each other
GNU General Public License v3.0
363 stars 53 forks source link

Evaluation error: Exception: Data must be 1-dimensional #3

Closed mariafiruleva closed 4 years ago

mariafiruleva commented 4 years ago

Dear sceasy team,

I try to conver Seurat object to h5ad and get this error in any case, except of integrated object.

data can be any Seurat object performed using standard approaches.

library(Seurat)
load('data.RData')
library(reticulate)
use_condaenv('anaconda3', required = T)
loompy <- reticulate::import('loompy')

adata <- sceasy:::seurat2anndata(data, outFile="data.h5ad")
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Evaluation error: Exception: Data must be 1-dimensional

Detailed traceback: 
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 985, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 348, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 459, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 7359, in _arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 7669, in _homogenize
    raise_cast_failure=False)
  File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/series
In addition: Warning message:
In .regularise_df(obj@meta.data) :
  Dropping single category variables:orig.ident

Session info:

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /scratch/opt/R/3.6.0/lib/R/lib/libRblas.so
LAPACK: /nfs/home/mfiruleva/anaconda3/lib/libmkl_rt.so

locale:
 [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ru_RU.UTF-8        LC_COLLATE=ru_RU.UTF-8    
 [5] LC_MONETARY=ru_RU.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=ru_RU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_1.12 Seurat_3.1.0   

loaded via a namespace (and not attached):
 [1] nlme_3.1-139           tsne_0.1-3             bitops_1.0-6          
 [4] RcppAnnoy_0.0.12       RColorBrewer_1.1-2     httr_1.4.1            
 [7] sctransform_0.2.0.9000 tools_3.6.0            backports_1.1.4       
[10] R6_2.4.0               irlba_2.3.3            KernSmooth_2.23-15    
[13] uwot_0.1.3             lazyeval_0.2.2         colorspace_1.4-1      
[16] npsurv_0.4-0           tidyselect_0.2.5       gridExtra_2.3         
[19] compiler_3.6.0         plotly_4.9.0           caTools_1.17.1.2      
[22] scales_1.0.0           lmtest_0.9-37          ggridges_0.5.1        
[25] pbapply_1.4-2          stringr_1.4.0          digest_0.6.20         
[28] R.utils_2.9.0          pkgconfig_2.0.2        htmltools_0.4.0       
[31] bibtex_0.4.2           htmlwidgets_1.5.1      rlang_0.4.0           
[34] zoo_1.8-6              jsonlite_1.6           ica_1.0-2             
[37] gtools_3.8.1           dplyr_0.8.3            R.oo_1.22.0           
[40] magrittr_1.5           sceasy_0.0.1           Matrix_1.2-17         
[43] Rcpp_1.0.2             munsell_0.5.0          ape_5.3               
[46] lifecycle_0.1.0        R.methodsS3_1.7.1      stringi_1.4.3         
[49] gbRd_0.4-11            MASS_7.3-51.4          gplots_3.0.1.1        
[52] Rtsne_0.15             plyr_1.8.4             grid_3.6.0            
[55] parallel_3.6.0         gdata_2.18.0           listenv_0.7.0         
[58] ggrepel_0.8.1          crayon_1.3.4           lattice_0.20-38       
[61] cowplot_1.0.0          splines_3.6.0          SDMTools_1.1-221.1    
[64] zeallot_0.1.0          pillar_1.4.2           igraph_1.2.4.1        
[67] future.apply_1.3.0     reshape2_1.4.3         codetools_0.2-16      
[70] leiden_0.3.1           glue_1.3.1             lsei_1.2-0            
[73] metap_1.1              data.table_1.12.2      RcppParallel_4.4.3    
[76] vctrs_0.2.0            png_0.1-7              Rdpack_0.11-0         
[79] gtable_0.3.0           RANN_2.6.1             purrr_0.3.2           
[82] tidyr_1.0.0            future_1.14.0          assertthat_0.2.1      
[85] ggplot2_3.2.1          rsvd_1.0.2             survival_2.44-1.1     
[88] viridisLite_0.3.0      tibble_2.1.3           cluster_2.0.8         
[91] globals_0.12.4         fitdistrplus_1.0-14    ROCR_1.0-7 
nh3 commented 4 years ago

Is the error message complete in the following line?

File "/nfs/home/mfiruleva/.local/lib/python3.7/site-packages/pandas/core/series

Could you also provide version information for the following python packages: anndata, h5py, pandas?

mariafiruleva commented 4 years ago

nh3, thanks for the quick reply.

  1. Yes.
  2. (base) mfiruleva@sphinx:~$ conda list | grep 'pandas\|loompy\|anndata\|h5py'
    anndata                   0.6.19                     py_0    bioconda
    h5py                      2.9.0            py37h7918eee_0  
    loompy                    2.0.16                     py_0    bioconda
    pandas                    0.24.2           py37he6710b0_0
nh3 commented 4 years ago

Thanks @mariafiruleva , could you provide a small seurat object from your data that help us reproduce the issue? When using the "pbmc_small" object provided by Seurat, converting to anndata ends up successfully.

mariafiruleva commented 4 years ago

@nh3 , oh, yes, I can convert pbmc_small object too. Hope you can help with my dataset.

RData.

Source of data: link.

Code:

library(dplyr)
library(functools)
library(ggplot2)
library(gridExtra)
library(Matrix)
library(sctransform)
library(Seurat)

## FUNCTIONS

add_metadata <- function(data) {
  mito.genes <-
    grep(pattern = "^Mt\\.|^MT\\.|^mt\\.|^Mt-|^MT-|^mt-",
         x = rownames(x = GetAssayData(object = data)),
         value = TRUE)
  percent.mito <-
    Matrix::colSums(GetAssayData(object = data, slot = "counts")[mito.genes, ]) /
    Matrix::colSums(GetAssayData(object = data, slot = "counts"))
  data[['percent.mito']] <- percent.mito
  data[['percent.mito_log10']] <- log10(data[['percent.mito']])
  data[['nCount_RNA_log10']] <- log10(data[['nCount_RNA']])
  data[['nFeature_RNA_log10']] <- log10(data[['nFeature_RNA']])
  data[['nCount_RNA_log2']] <- log2(data[['nCount_RNA']])
  data[['nFeature_RNA_log2']] <- log2(data[['nFeature_RNA']])
  data[['scaled_mito']] <- scale(percent.mito)
  data[['scaled_nCount_RNA']] <- scale(data[['nCount_RNA_log10']])
  data
}

get_conf_interval <- function(dataset, parameter) {
  left <- mean(dataset[[parameter]][[1]]) - qnorm(0.975)
  right <- mean(dataset[[parameter]][[1]]) + qnorm(0.975)
  return(c(left, right))
}

## GATHERING DATA TOGETHER

fdata <- get(load("/scratch/mfiruleva/data/var/www/html/SRA/SRA.final/SRA592147_SRS2384613.sparse.RData"))
fdata@Dimnames[[1]] <-
  make.names(gsub(".ENS.*", "", fdata@Dimnames[[1]]), unique = T)
whole <- CreateSeuratObject(
  counts = fdata,
  min.cells = 2,
  min.features = 200,
  project = "SRA592147"
)
whole <- add_metadata(whole)

## FILTER MT CONTENT

mt_dist <- as.data.frame(whole[['scaled_mito']][[1]])
colnames(mt_dist) <- 'scaled_mito'
whole <-
  subset(
    x = whole,
    subset = scaled_mito < get_conf_interval(whole, 'scaled_mito')[2]
  )

## NORMALIZATION
whole <-
  SCTransform(
    whole,
    ncells=min(100000, ncol(whole)),
    vars.to.regress = c("percent.mito"),
    verbose = T,
    conserve.memory = T
  )

gc()

## PCA

whole <- RunPCA(object = whole, features = VariableFeatures(object = whole), npcs=50)

## TSNE

whole <-
  RunTSNE(whole, dims = 1:20, tsne.method = "FIt-SNE",
          fast_tsne_path = "/nfs/home/kzajcev/FIt-SNE/bin/fast_tsne", nthreads = 4, max_iter = 2000)

## UMAP

whole <- RunUMAP(whole, dims = 1:20)

## CLUSTERING

whole <- FindNeighbors(object = whole, dims = 1:20)
whole <- FindClusters(object = whole, resolution = 0.6)

## SAVING
file_out <- paste0("SRA592147", '.RData')
save(list = c('whole', 'whole.markers'), file = file_out)

After that, I run:

library(Seurat)
library(sceasy)

load('SRA592147.RData')
library(reticulate)
use_condaenv('anaconda3', required = T)
loompy <- reticulate::import('loompy')
adata <- sceasy:::seurat2anndata(whole, outFile="data.h5ad")

And get the error.

UPD: I see, the reason of the problem is this line:

data[['scaled_mito']] <- scale(percent.mito) Because scale() returns a dataframe.

Thanks for the amazing tool!

And sorry fot the issue. :(