Automated, probabilistic assignment of cell types in scRNA-seq data
Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Tried to convert 'shape' to a tensor and failed. #94

When trying to run cellassign on the example dataset,

> library(cellassign)
> data(example_sce)
> data(example_marker_mat)
> s <- SingleCellExperiment::sizeFactors(example_sce)
> fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),], 
+                   marker_gene_info = example_marker_mat, 
+                   s = s, 
+                   learning_rate = 1e-2, 
+                   shrinkage = TRUE,
+                   verbose = FALSE)

R raised the following error:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
ValueError: Tried to convert 'shape' to a tensor and failed. 
Error: Cannot convert a partially known TensorShape to a Tensor: (1, ?)

And this is the full traceback:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/
LAPACK: /usr/lib/x86_64-linux-gnu/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 

Anyone encountered the same issue?

I think I've found a solution.
When installing for tensorflow package for R, it was suggested that we use install.packages("tensorflow"). By default this installs the latest release of R tensorflow (v2.7.0 as of 2021.12.4).
Instead of using install.packages("tensorflow"), I used devtools to build from github and explicitly specify the version I want to use.

For example, I choose to use tensorflow 2.4.0 (with cuda 11.0 and cudnn 8.0.4), in R I did:


Then in shell:

$ pip install tensorflow==2.4.0
$ pip install tensorflow-probability==0.12.0

Now cellassign works.

> library(tensorflow)
> library(cellassign)
2021-12-05 13:23:30.455080: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
> tensorflow::tf_config()
TensorFlow v2.4.0 (~/miniconda3/envs/cellassign/lib/python3.7/site-packages/tensorflow)
Python v3.7 (~/miniconda3/envs/cellassign/bin/python)
> data(example_sce)
> data(example_marker_mat)
> s <- SingleCellExperiment::sizeFactors(example_sce)
> fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),],
+                   marker_gene_info = example_marker_mat,
+                   s = s,
+                   learning_rate = 1e-2,
+                   shrinkage = TRUE,
+                   verbose = FALSE)
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/
LAPACK: /usr/lib/x86_64-linux-gnu/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 

@kieranrcampbell @Irrationone Maybe the README and Documentation should be updated?

Thanks a lot, As the conda "bioinfo" environment had installed "tensorflow==2.4.0" and "tensorflow-probability==0.12.0",
Today I had run the code as you provided on my R studio Server. At first, the example data run successfully as usuall. And the result “fit” seems no problem. Although the report claimed some errors which I can't understand. But when I run my Seurat data (which had been transfered into SingleCellExperiment object), things went different. Follows were my code.


> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
>             To access cell types, call celltypes(x)
>             To access cell type probabilities, call cellprobs(x)

However, when I had run my data during the flow of "cellassign"; The R studio Server was

> An object of class Seurat 
> 26069 features across 131310 samples within 2 assays 
> Active assay: RNA (25029 features, 5000 variable features)
>  1 other assay present: Net
>  3 dimensional reductions calculated: NetPCA, NetTSNE, NetUMAP

DefaultAssay(sce_seurat) <- 'RNA' 

#Transfer the seurat object into an SingleCellExperiment object
sce <- as.SingleCellExperiment(sce_seurat) #sce_seurat is an seurat object


> An object of class "SimpleAssays"
> Slot "data":
> List of length 3
> names(3): counts logcounts scaledata

#sce_seurat had been clustered.

> Levels: 0 1 10 2 3 4 5 6 7 8 9

sce <- scran::computeSumFactors(sce, clusters =sce$Net_snn_res.0.1)

## Find markers shared with the data set

shared <- intersect(rownames(celltype), rownames(sce))
s <- SingleCellExperiment::sizeFactors(sce)

#celltype was a marker matrix acquired from Cellmarker

>       Monocyte Natural killer cell B cell
> 11            0                   0      0
> A4GALT   0                   0      0
> ABC         0                   0      0
shared <- intersect(rownames(celltype), rownames(sce))  # the ‘’11‘’ were removed by this code
fit <- cellassign(exprs_obj = sce[shared, ], 
                  marker_gene_info = celltype[shared, ],  
                  s = s,
                  learning_rate = 1e-2,
                  shrinkage = TRUE,
                  verbose = FALSE)
#Then the error happened

> 2021-12-07 14:32:30.381110: I tensorflow/compiler/jit/] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 2021-12-07 14:32:31.578288: W tensorflow/core/framework/] Allocation of 185724864000 exceeds  10% of free system memory.
> 2021-12-07 14:32:35.428487: W tensorflow/core/framework/] Allocation of 185724864000 exceeds 10% of free system memory.
> 2021-12-07 14:32:36.231957: W tensorflow/core/framework/] Allocation of 185724864000 exceeds 10% of free system memory.

Then the R session was abnormally terminated by an unknow crash. And I had tried it three times. The results were reported as the same as previous. Could you help me with this problem.

Hi @GGGGGHua This issue has been posted before: #66, but in that thread there seem to be no final solution to the problem.

As stated in that thread, I suspect this is due to large cell number in your object (131310 cells). To give you an heuristic solution, you can try down-sample your data to an acceptable size, for example using geosketch, and then run cellassign separately on each sub-sample.

Hi @GGGGGHua This issue has been posted before: #66, but in that thread there seem to be no final solution to the problem.

As stated in that thread, I suspect this is due to large cell number in your object (131310 cells). To give you an heuristic solution, you can try down-sample your data to an acceptable size, for example using geosketch, and then run cellassign separately on each sub-sample.

Thanks a lot for your timely reply. I had doubted that the over large cell number resulted in the terminated R sesseion.
The data was an integrated single-cell Seurat object. If I run cellassign separately on each sub-sample, The "integrated" would be nonsense. Later, I would try to submit the work on a slurm system. If it still doesn't work, perhaps, I should relinquish the cellassign.

Have you tried run cellassign before integration?

If you're working on a large integrated object, then I'd assume most algorithms would face resource issues such as memory shortage. Maybe adopting a "split-and-conquer" strategy would better suit your data, right?

No, I haven't. Thanks for your suggestion. I would consider your kind recommendation. Maybe, first splitting into several sub- cell populations was a better solution.

You're welcome
Be cautious with how you split the data. If you split your data into subsets based on clusters, you may not get a full representation of the original cell population. I'm not sure whether cellassign would achieve its best performance in that scenario.

The safest way is to sub-sample your data somewhat randomly from each cluster, so that each sub-sample preserves the biological complexity of the original data. This is exactly what geosketch does, which is why I recommended it.

Good luck!