cole-trapnell-lab / monocle3

Other
337 stars 101 forks source link

calculateLW needs to update rBind() to rbind() #512

Open jdrnevich opened 3 years ago

jdrnevich commented 3 years ago

Describe the bug I had a similar issue to #509 when running graph_test() and getting the error Error: 'rBind' is defunct.. I tried KforKuma's solution of rebooting my computer but still had the same error. Again, like KforKuma, traceback() indicated calculateLW() is where the error occurred. Looking at the code for monocle3:::calculateLW(), it hard-codes block_size <- 10000 and then runs a for loop on the blocks then uses Matrix::rBind() at the end of the for loop. My data set has 10,084 cells but graph_test() works if I subset it to 9999 cells.

To Reproduce The code that produced the bug:

> dim(cds2)
[1] 23701 10084
> test_res_8hr_16hr <- graph_test(cds2, neighbor_graph="principal_graph", cores=1)
Error: 'rBind' is defunct.
 Since R version 3.2.0, base's rbind() should work fine with S4 objects
> 

traceback() After the error, run traceback() in R and post the output:

> traceback()
5: stop(errorCondition(msg, old = fname, new = new, package = package, 
       class = "defunctError"))
4: .Defunct(msg = "'rBind' is defunct.\n Since R version 3.2.0, base's rbind() should work fine with S4 objects")
3: Matrix::rBind(tmp, cur_tmp)
2: calculateLW(cds, k = k, verbose = verbose, neighbor_graph = neighbor_graph, 
       reduction_method = reduction_method)
1: graph_test(cds2, neighbor_graph = "principal_graph", cores = 1)

Expected behavior A clear and concise description of what you expected to happen.

> cds3 <- cds2[,1:9999]
> test_res_8hr_16hr <- graph_test(cds3, neighbor_graph="principal_graph", cores=1)
  |                                                                             |   1%, ETA 39:20> 
#computation runs without error

Screenshots If applicable, add screenshots to help explain your problem.

sessionInfo(): Run sessionInfo() in R and post the output

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SeuratWrappers_0.3.0        dittoSeq_1.4.1              monocle3_1.0.0             
 [4] magrittr_2.0.1              dplyr_1.0.7                 svglite_2.0.0              
 [7] ggplot2_3.3.4               SeuratObject_4.0.2          Seurat_4.0.3               
[10] DropletUtils_1.12.1         SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
[13] Biobase_2.52.0              GenomicRanges_1.44.0        GenomeInfoDb_1.28.0        
[16] IRanges_2.26.0              S4Vectors_0.30.0            BiocGenerics_0.38.0        
[19] MatrixGenerics_1.4.0        matrixStats_0.59.0          simpleSingleCell_1.16.0    

loaded via a namespace (and not attached):
  [1] utf8_1.2.1                reticulate_1.20           R.utils_2.10.1           
  [4] tidyselect_1.1.1          htmlwidgets_1.5.3         grid_4.1.0               
  [7] BiocParallel_1.26.0       Rtsne_0.15                munsell_0.5.0            
 [10] units_0.7-2               codetools_0.2-18          ica_1.0-2                
 [13] future_1.21.0             miniUI_0.1.1.1            withr_2.4.2              
 [16] colorspace_2.0-2          knitr_1.33                ROCR_1.0-11              
 [19] tensor_1.5                pbmcapply_1.5.0           listenv_0.8.0            
 [22] slam_0.1-48               GenomeInfoDbData_1.2.6    polyclip_1.10-0          
 [25] pheatmap_1.0.12           rhdf5_2.36.0              LearnBayes_2.15.1        
 [28] coda_0.19-4               parallelly_1.26.0         vctrs_0.3.8              
 [31] generics_0.1.0            xfun_0.23                 R6_2.5.0                 
 [34] rsvd_1.0.5                locfit_1.5-9.4            bitops_1.0-7             
 [37] rhdf5filters_1.4.0        spatstat.utils_2.2-0      DelayedArray_0.18.0      
 [40] assertthat_0.2.1          promises_1.2.0.1          scales_1.1.1             
 [43] gtable_0.3.0              beachmat_2.8.0            globals_0.14.0           
 [46] processx_3.5.2            goftest_1.2-2             rlang_0.4.11             
 [49] systemfonts_1.0.2         splines_4.1.0             lazyeval_0.2.2           
 [52] spatstat.geom_2.2-0       BiocManager_1.30.16       reshape2_1.4.4           
 [55] abind_1.4-5               httpuv_1.6.1              tools_4.1.0              
 [58] spData_0.3.10             ellipsis_0.3.2            spatstat.core_2.2-0      
 [61] raster_3.4-13             RColorBrewer_1.1-2        proxy_0.4-26             
 [64] ggridges_0.5.3            Rcpp_1.0.6                plyr_1.8.6               
 [67] sparseMatrixStats_1.4.0   zlibbioc_1.38.0           classInt_0.4-3           
 [70] purrr_0.3.4               RCurl_1.98-1.3            ps_1.6.0                 
 [73] rpart_4.1-15              deldir_0.2-10             pbapply_1.4-3            
 [76] viridis_0.6.1             cowplot_1.1.1             zoo_1.8-9                
 [79] ggrepel_0.9.1             cluster_2.1.2             tinytex_0.32             
 [82] data.table_1.14.0         scattermore_0.7           gmodels_2.18.1           
 [85] lmtest_0.9-38             RANN_2.6.1                fitdistrplus_1.1-5       
 [88] patchwork_1.1.1           mime_0.11                 evaluate_0.14            
 [91] xtable_1.8-4              XML_3.99-0.6              gridExtra_2.3            
 [94] compiler_4.1.0            tibble_3.1.2              KernSmooth_2.23-20       
 [97] crayon_1.4.1              R.oo_1.24.0               htmltools_0.5.1.1        
[100] mgcv_1.8-35               later_1.2.0               spdep_1.1-8              
[103] tidyr_1.1.3               expm_0.999-6              DBI_1.1.1                
[106] MASS_7.3-54               boot_1.3-28               sf_1.0-0                 
[109] Matrix_1.3-3              gdata_2.18.0              R.methodsS3_1.8.1        
[112] igraph_1.2.6              pkgconfig_2.0.3           sp_1.4-5                 
[115] plotly_4.9.4.1            scuttle_1.2.0             spatstat.sparse_2.0-0    
[118] dqrng_0.3.0               XVector_0.32.0            stringr_1.4.0            
[121] callr_3.7.0               digest_0.6.27             sctransform_0.3.2        
[124] RcppAnnoy_0.0.18          graph_1.70.0              spatstat.data_2.1-0      
[127] rmarkdown_2.9             leiden_0.3.8              uwot_0.1.10              
[130] edgeR_3.34.0              DelayedMatrixStats_1.14.0 gtools_3.9.2             
[133] shiny_1.6.0               lifecycle_1.0.0           nlme_3.1-152             
[136] jsonlite_1.7.2            Rhdf5lib_1.14.1           CodeDepends_0.6.5        
[139] viridisLite_0.4.0         limma_3.48.0              fansi_0.5.0              
[142] pillar_1.6.1              lattice_0.20-44           fastmap_1.1.0            
[145] httr_1.4.2                survival_3.2-11           glue_1.4.2               
[148] remotes_2.4.0             png_0.1-7                 class_7.3-19             
[151] stringi_1.6.2             HDF5Array_1.20.0          irlba_2.3.3              
[154] e1071_1.7-7               future.apply_1.7.0 

Additional context You probably just need to replace Matrix::rBind with rbind in calculateLW() and any other functions it may be in. Thanks!

mleventh commented 2 years ago

I will give this a bump because the way this has to be fixed on the user's end is very tricky. As jdrnevich pointed out, this error occurs if graph_test is run when there are more than 10,000 cells, leading to the data being split into blocks, calling the defunct rBind function. If one were to copy the graph_test code, it will not work given monocle and its dependencies if you are intending to identify genes differentially expressed by pseudotime (i.e. using the argument neighbor_graph=principal_graph. In order for this function to work, the call of jaccard_index() needs to be changed to monocle3:::jaccard index and a copy of RcppExports.cpp (found in the github src) needs to copied into the user's path_to_libraries/R/library/monocle3/ directory within a user-created directory src within path_to_libraries/R/library/monocle3.

Changing tmp <- Matrix::rBind(tmp, cur_tmp) to tmp <- rbind(tmp, cur_tmp) on line 392 in graph_test.py in the Monocle3 source code would be much appreciated and save the above headache

Sophia409 commented 2 years ago

How did you solve this quesion? What should users do to avoid this now? @jdrnevich @mleventh

jdrnevich commented 2 years ago

I don't know, @Sophia409 - that's a question for @ctrapnell or @brgew to see if they ever fixed the bug. In my case, I just slightly increased my cell filtering criteria to get under 10K cells since I only had 10,084 cells to start.

mleventh commented 2 years ago

@Sophia409 I defined my own function called graph_test_large(), copying thegraph_test() code into this new function. I then changed the call of jaccard_index() in this function to monocle3:::jaccard_index() and copying Rcppexports.R from this github into the path to a src directory in my monocle3 library (i.e. copying to path_to_libraries/R/library/monocle3/src). Once these changes were made I could call graph_test_large() where I had previously used graph_test()

Sophia409 commented 2 years ago

@mleventh Hi, thank you for your answer. I tried your method, but it has two problems. Firstly, I couldn't find jaccard_index() function in graph_test function.

function (cds, neighbor_graph = c("knn", "principal_graph"), 
  reduction_method = "UMAP", k = 25, method = c("Moran_I"), 
  alternative = "greater", expression_family = "quasipoisson", 
  cores = 1, verbose = FALSE) 
{
  neighbor_graph <- match.arg(neighbor_graph)
  lw <- calculateLW(cds, k = k, verbose = verbose, neighbor_graph = neighbor_graph, 
    reduction_method = reduction_method)
  if (verbose) {
    message("Performing Moran's I test: ...")
  }
  exprs_mat <- SingleCellExperiment::counts(cds)[, attr(lw, 
    "region.id"), drop = FALSE]
  sz <- size_factors(cds)[attr(lw, "region.id")]
  wc <- spdep::spweights.constants(lw, zero.policy = TRUE, 
    adjust.n = TRUE)
  test_res <- pbmcapply::pbmclapply(row.names(exprs_mat), 
    FUN = function(x, sz, alternative, method, expression_family) {
      exprs_val <- exprs_mat[x, ]
      if (expression_family %in% c("uninormal", "binomialff")) {
        exprs_val <- exprs_val
      }
      else {
        exprs_val <- log10(exprs_val/sz + 0.1)
      }
      test_res <- tryCatch({
        if (method == "Moran_I") {
          mt <- suppressWarnings(my.moran.test(exprs_val, 
            lw, wc, alternative = alternative))
          data.frame(status = "OK", p_value = mt$p.value, 
            morans_test_statistic = mt$statistic, morans_I = mt$estimate[["Moran I statistic"]])
        }
        else if (method == "Geary_C") {
          gt <- suppressWarnings(my.geary.test(exprs_val, 
            lw, wc, alternative = alternative))
          data.frame(status = "OK", p_value = gt$p.value, 
            geary_test_statistic = gt$statistic, geary_C = gt$estimate[["Geary C statistic"]])
        }
      }, error = function(e) {
        data.frame(status = "FAIL", p_value = NA, morans_test_statistic = NA, 
          morans_I = NA)
      })
    }, sz = sz, alternative = alternative, method = method, 
    expression_family = expression_family, mc.cores = cores, 
    ignore.interactive = TRUE)
  if (verbose) {
    message("returning results: ...")
  }
  test_res <- do.call(rbind.data.frame, test_res)
  row.names(test_res) <- row.names(cds)
  test_res <- merge(test_res, rowData(cds), by = "row.names")
  row.names(test_res) <- test_res[, 1]
  test_res[, 1] <- NULL
  test_res$q_value <- 1
  test_res$q_value[which(test_res$status == "OK")] <- stats::p.adjust(subset(test_res, 
    status == "OK")[, "p_value"], method = "BH")
  test_res$status = as.character(test_res$status)
  test_res[row.names(cds), ]
}

Secondly, since I use R in windows system, I couldn't find src file in my library. 图片 What's your advice? Would be greatly appreciated!

mleventh commented 2 years ago

Hi, sorry for some of the confusion! I also had re-instantiated the functions called by graph_test, those being my.moran.test,my.geary.test, andcalculateLW. It is incalculateLW` where jaccard_index() is called and should be changed to monocle3:::jaccard_index()

As for the src directory, what I had to do was go into my libs directory, find monocle3, make a src directory and put the Rccp file into it. I might suggest looking into your "libs" folder, finding monocle3 and making a src directory to put the Rccp file.