Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Tried to convert 'shape' to a tensor and failed.

RuiyuRayWang commented 2 years ago

When trying to run cellassign on the example dataset,

> library(cellassign)
> data(example_sce)
> data(example_marker_mat)
> s <- SingleCellExperiment::sizeFactors(example_sce)
> fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),], 
+                   marker_gene_info = example_marker_mat, 
+                   s = s, 
+                   learning_rate = 1e-2, 
+                   shrinkage = TRUE,
+                   verbose = FALSE)

R raised the following error:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
ValueError: Tried to convert 'shape' to a tensor and failed. 
Error: Cannot convert a partially known TensorShape to a Tensor: (1, ?)

And this is the full traceback:

9. stop(structure(list(message = "ValueError: Tried to convert 'shape' to a tensor and failed. Error: Cannot convert a partially known TensorShape to a Tensor: (1, ?)", 
call = py_call_impl(callable, dots$args, dots$keywords), 
cppstack = structure(list(file = "", line = -1L, stack = c("/home/luolab/R/x86_64-pc-linux-gnu-library/4.0/reticulate/libs/reticulate.so(Rcpp::exception::exception(char const*, bool)+0x78) [0x7f0389bc3798]", 
"/home/luolab/R/x86_64-pc-linux-gnu-library/4.0/reticulate/libs/reticulate.so(Rcpp::stop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x27) [0x7f0389bc3807]", ...

8. _apply_op_helper at op_def_library.py#486

7. reshape at gen_array_ops.py#7443

6. reshape at array_ops.py#193

5. tf$reshape(tf$reduce_logsumexp(p_y_on_c_unorm, 0L), shape(1, -1))

4. inference_tensorflow(Y = Y, rho = rho, s = s, X = X, G = G, C = C, 
    N = N, P = P, B = B, shrinkage = shrinkage, verbose = verbose, 
    n_batches = n_batches, rel_tol_adam = rel_tol_adam, rel_tol_em = rel_tol_em, 
    max_iter_adam = max_iter_adam, max_iter_em = max_iter_em, ...

3. FUN(X[[i]], ...)

2. lapply(seq_len(num_runs), function(i) {
    res <- inference_tensorflow(Y = Y, rho = rho, s = s, X = X, 
    G = G, C = C, N = N, P = P, B = B, shrinkage = shrinkage, 
    verbose = verbose, n_batches = n_batches, rel_tol_adam = rel_tol_adam, ...

1. cellassign(exprs_obj = example_sce[rownames(example_marker_mat), ], 
    marker_gene_info = example_marker_mat, s = s, learning_rate = 0.01, 
    shrinkage = TRUE, verbose = FALSE)

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cellassign_0.99.21 tensorflow_2.7.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7                  XVector_0.30.0              GenomeInfoDb_1.26.7         compiler_4.0.3             
 [5] zlibbioc_1.36.0             bitops_1.0-7                MatrixGenerics_1.2.1        prettyunits_1.1.1          
 [9] base64enc_0.1-3             remotes_2.4.1               tools_4.0.3                 testthat_3.1.0             
[13] SingleCellExperiment_1.12.0 pkgbuild_1.2.0              pkgload_1.2.3               jsonlite_1.7.2             
[17] memoise_2.0.1               lifecycle_1.0.1             lattice_0.20-41             png_0.1-7                  
[21] rlang_0.4.12                Matrix_1.3-4                DelayedArray_0.16.3         cli_3.1.0                  
[25] parallel_4.0.3              fastmap_1.1.0               GenomeInfoDbData_1.2.4      withr_2.4.2                
[29] IRanges_2.24.1              S4Vectors_0.28.1            desc_1.4.0                  fs_1.5.0                   
[33] devtools_2.4.3              stats4_4.0.3                rprojroot_2.0.2             grid_4.0.3                 
[37] Biobase_2.50.0              reticulate_1.22             glue_1.5.1                  R6_2.5.1                   
[41] processx_3.5.2              sessioninfo_1.2.1           callr_3.7.0                 purrr_0.3.4                
[45] magrittr_2.0.1              whisker_0.4                 GenomicRanges_1.42.0        matrixStats_0.61.0         
[49] BiocGenerics_0.36.1         ps_1.6.0                    tfruns_1.5.0                ellipsis_0.3.2             
[53] usethis_2.1.3               SummarizedExperiment_1.20.0 RCurl_1.98-1.5              cachem_1.0.6               
[57] crayon_1.4.2

Anyone encountered the same issue?

RuiyuRayWang commented 2 years ago

I think I've found a solution.
When installing for tensorflow package for R, it was suggested that we use install.packages("tensorflow"). By default this installs the latest release of R tensorflow (v2.7.0 as of 2021.12.4).
Instead of using install.packages("tensorflow"), I used devtools to build from github and explicitly specify the version I want to use.

For example, I choose to use tensorflow 2.4.0 (with cuda 11.0 and cudnn 8.0.4), in R I did:

devtools::install_github("rstudio/tensorflow@v2.4.0")

Then in shell:

$ pip install tensorflow==2.4.0
$ pip install tensorflow-probability==0.12.0

Now cellassign works.

> library(tensorflow)
> library(cellassign)
2021-12-05 13:23:30.455080: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> tensorflow::tf_config()
TensorFlow v2.4.0 (~/miniconda3/envs/cellassign/lib/python3.7/site-packages/tensorflow)
Python v3.7 (~/miniconda3/envs/cellassign/bin/python)
> data(example_sce)
> data(example_marker_mat)
> s <- SingleCellExperiment::sizeFactors(example_sce)
> fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),],
+                   marker_gene_info = example_marker_mat,
+                   s = s,
+                   learning_rate = 1e-2,
+                   shrinkage = TRUE,
+                   verbose = FALSE)
2021-12-05 13:23:44.570707: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-05 13:23:44.571896: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-12-05 13:23:44.612319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: Quadro M4000 computeCapability: 5.2
coreClock: 0.7725GHz coreCount: 13 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 179.11GiB/s
2021-12-05 13:23:44.612373: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-12-05 13:23:44.615303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-12-05 13:23:44.615372: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-12-05 13:23:44.616631: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-12-05 13:23:44.616962: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-12-05 13:23:44.619893: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-12-05 13:23:44.620633: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-12-05 13:23:44.620834: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.6.3/lib/R/lib:/lib:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/java-11-openjdk-amd64/lib/server
2021-12-05 13:23:44.620852: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-12-05 13:23:44.716409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-05 13:23:44.716463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-12-05 13:23:44.716487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-12-05 13:23:44.731095: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-12-05 13:23:44.738373: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2993160000 Hz

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] splatter_1.10.1             SingleCellExperiment_1.8.0  SummarizedExperiment_1.16.1 DelayedArray_0.12.3        
 [5] BiocParallel_1.20.1         matrixStats_0.61.0          Biobase_2.46.0              GenomicRanges_1.38.0       
 [9] GenomeInfoDb_1.22.1         IRanges_2.20.2              S4Vectors_0.24.4            BiocGenerics_0.32.0        
[13] cellassign_0.99.21          tensorflow_2.4.0           

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.4         Rcpp_1.0.7             here_1.0.1             lattice_0.20-38        prettyunits_1.1.1     
 [6] png_0.1-7              ps_1.6.0               utf8_1.2.2             rprojroot_2.0.2        R6_2.5.1              
[11] backports_1.4.0        ggplot2_3.3.5          pillar_1.6.4           tfruns_1.5.0           zlibbioc_1.32.0       
[16] rlang_0.4.12           whisker_0.4            callr_3.7.0            Matrix_1.2-18          checkmate_2.0.0       
[21] reticulate_1.22        desc_1.4.0             devtools_2.4.3         RCurl_1.98-1.5         munsell_0.5.0         
[26] compiler_3.6.3         pkgconfig_2.0.3        base64enc_0.1-3        pkgbuild_1.2.1         tibble_3.1.6          
[31] GenomeInfoDbData_1.2.2 fansi_0.5.0            crayon_1.4.2           withr_2.4.3            bitops_1.0-7          
[36] rappdirs_0.3.3         grid_3.6.3             jsonlite_1.7.2         gtable_0.3.0           lifecycle_1.0.1       
[41] magrittr_2.0.1         scales_1.1.1           cli_3.1.0              cachem_1.0.6           XVector_0.26.0        
[46] fs_1.5.1               remotes_2.4.2          testthat_3.1.1         vctrs_0.3.8            ellipsis_0.3.2        
[51] tools_3.6.3            glue_1.5.1             purrr_0.3.4            processx_3.5.2         pkgload_1.2.4         
[56] fastmap_1.1.0          colorspace_2.0-2       BiocManager_1.30.16    sessioninfo_1.2.1      memoise_2.0.1         
[61] usethis_2.1.3

@kieranrcampbell @Irrationone Maybe the README and Documentation should be updated?

GuohuaZhu commented 2 years ago

Thanks a lot, As the conda "bioinfo" environment had installed "tensorflow==2.4.0" and "tensorflow-probability==0.12.0",
Today I had run the code as you provided on my R studio Server. At first, the example data run successfully as usuall. And the result “fit” seems no problem. Although the report claimed some errors which I can't understand. But when I run my Seurat data (which had been transfered into SingleCellExperiment object), things went different. Follows were my code.

library(cellassign)

> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> WARNING: No metadata found in /home/users/miniconda3/envs/bioinfo/lib/python3.7/site-packages
> 2021-12-07 13:19:27.538524: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/users/miniconda3/envs/bioinfo/lib/R/lib:/.singularity.d/libs:::/.singularity.d/libs
> 2021-12-07 13:19:27.538585: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

 library(tensorflow)

 tensorflow::tf_config()

> TensorFlow v2.4.0 (~/miniconda3/envs/bioinfo/lib/python3.7/site-packages/tensorflow)
> Python v3.7 (~/miniconda3/envs/bioinfo/bin/python)

data(example_sce)
data(example_marker_mat)
s <- SingleCellExperiment::sizeFactors(example_sce)
fit <- cellassign(exprs_obj = example_sce[rownames(example_marker_mat),],
                  marker_gene_info = example_marker_mat,
                  s = s,
                  learning_rate = 1e-2,
                  shrinkage = TRUE,
                  verbose = FALSE)

> 2021-12-07 13:25:10.115757: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
> 2021-12-07 13:25:10.123553: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 2021-12-07 13:25:10.126575: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/users/miniconda3/envs/bioinfo/lib/R/lib:/.singularity.d/libs:::/.singularity.d/libs
> 2021-12-07 13:25:10.126598: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
> 2021-12-07 13:25:10.126619: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cpu20): /proc/driver/nvidia/version does not exist
> 2021-12-07 13:25:10.141464: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
> 2021-12-07 13:25:10.159179: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2400000000 Hz

print(fit)

> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
>             To access cell types, call celltypes(x)
>             To access cell type probabilities, call cellprobs(x)

However, when I had run my data during the flow of "cellassign"; The R studio Server was


sce_seurat
> An object of class Seurat 
> 26069 features across 131310 samples within 2 assays 
> Active assay: RNA (25029 features, 5000 variable features)
>  1 other assay present: Net
>  3 dimensional reductions calculated: NetPCA, NetTSNE, NetUMAP

DefaultAssay(sce_seurat) <- 'RNA' 

#Transfer the seurat object into an SingleCellExperiment object
sce <- as.SingleCellExperiment(sce_seurat) #sce_seurat is an seurat object

sce@assays

> An object of class "SimpleAssays"
> Slot "data":
> List of length 3
> names(3): counts logcounts scaledata

#sce_seurat had been clustered.

sce$Net_snn_res.0.1
> Levels: 0 1 10 2 3 4 5 6 7 8 9

sce <- scran::computeSumFactors(sce, clusters =sce$Net_snn_res.0.1)

## Find markers shared with the data set

shared <- intersect(rownames(celltype), rownames(sce))
s <- SingleCellExperiment::sizeFactors(sce)

#celltype was a marker matrix acquired from Cellmarker

head(celltype)[1:3,1:3]
>       Monocyte Natural killer cell B cell
> 11            0                   0      0
> A4GALT   0                   0      0
> ABC         0                   0      0
shared <- intersect(rownames(celltype), rownames(sce))  # the ‘’11‘’ were removed by this code
fit <- cellassign(exprs_obj = sce[shared, ], 
                  marker_gene_info = celltype[shared, ],  
                  s = s,
                  learning_rate = 1e-2,
                  shrinkage = TRUE,
                  verbose = FALSE)
#Then the error happened

> 2021-12-07 14:32:30.381110: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 2021-12-07 14:32:31.578288: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 185724864000 exceeds  10% of free system memory.
> 2021-12-07 14:32:35.428487: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 185724864000 exceeds 10% of free system memory.
> 2021-12-07 14:32:36.231957: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 185724864000 exceeds 10% of free system memory.

Then the R session was abnormally terminated by an unknow crash. And I had tried it three times. The results were reported as the same as previous. Could you help me with this problem.

RuiyuRayWang commented 2 years ago

Hi @GGGGGHua This issue has been posted before: #66, but in that thread there seem to be no final solution to the problem.

As stated in that thread, I suspect this is due to large cell number in your object (131310 cells). To give you an heuristic solution, you can try down-sample your data to an acceptable size, for example using geosketch, and then run cellassign separately on each sub-sample.

GuohuaZhu commented 2 years ago

Hi @GGGGGHua This issue has been posted before: #66, but in that thread there seem to be no final solution to the problem.

As stated in that thread, I suspect this is due to large cell number in your object (131310 cells). To give you an heuristic solution, you can try down-sample your data to an acceptable size, for example using geosketch, and then run cellassign separately on each sub-sample.

Thanks a lot for your timely reply. I had doubted that the over large cell number resulted in the terminated R sesseion.
The data was an integrated single-cell Seurat object. If I run cellassign separately on each sub-sample, The "integrated" would be nonsense. Later, I would try to submit the work on a slurm system. If it still doesn't work, perhaps, I should relinquish the cellassign.

RuiyuRayWang commented 2 years ago

Have you tried run cellassign before integration?

If you're working on a large integrated object, then I'd assume most algorithms would face resource issues such as memory shortage. Maybe adopting a "split-and-conquer" strategy would better suit your data, right?

GuohuaZhu commented 2 years ago

No, I haven't. Thanks for your suggestion. I would consider your kind recommendation. Maybe, first splitting into several sub- cell populations was a better solution.

RuiyuRayWang commented 2 years ago

You're welcome
Be cautious with how you split the data. If you split your data into subsets based on clusters, you may not get a full representation of the original cell population. I'm not sure whether cellassign would achieve its best performance in that scenario.

The safest way is to sub-sample your data somewhat randomly from each cluster, so that each sub-sample preserves the biological complexity of the original data. This is exactly what geosketch does, which is why I recommended it.

Good luck!

Irrationone / cellassign

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: Tried to convert 'shape' to a tensor and failed. #94