Error in Dataset binning

AIBio commented 1 year ago

Hi,

This toolkit is excellent! I try to re-run the code in "Scriabin's summarized interaction graph workflow". But I met an error during binning dataset. Here is output:

Setting optim_k.unique to 1.33
Binning with coarse cell types
Constructing neighbor graphs with reduction pca . . . 
Splitting object . . . 
Preparing integration . . . 
Generating anchorsets . . . 
Computing 2000 integration features
Scaling features for provided objects
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s  
Finding all pairwise anchors
  |                                                  | 0 % ~calculating  Running CCA
Merging objects
Finding neighborhoods
Finding anchors
    Found 16208 anchors
Filtering anchors
    Retained 5126 anchors
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 25s
Mapping anchor names . . . 
Generating anchor bins . . . 
Generating connectivity matrix . . . 
Optimizing number of bins . . . 
Error in if (check_completion(nbs_completion)) { : 
  the condition has length > 1

Could you help me to figure it out?

Hanwen

ajwilk commented 1 year ago

Hi Hanwen, Thanks for your patience. Can you confirm if you are running through this vignette? If so, I cannot reproduce your error. Can you try with set.seed(1234)?

hmnb commented 1 year ago

Same issue for me. No idea how it could be solved.

Error in if (check_completion(nbs_completion)) { : the condition has length > 1 4. AlignDatasets(seuObj = seu, dims = dims, anchor_score_threshold = anchor_score_threshold, split.by = split.by, optim_quan.threshold = optim_quan.threshold, optim_k.unique = optim_k.unique, snn.reduction = snn.reduction, verbose = verbose) 3. FUN(X[[i]], ...) 2. lapply(seq_along(1:length(seu_ct_split)), function(x) { seu_oi <- AlignDatasets(seuObj = seu, dims = dims, anchor_score_threshold = anchor_score_threshold, split.by = split.by, optim_quan.threshold = optim_quan.threshold, optim_k.unique = optim_k.unique, snn.reduction = snn.reduction, ... 1. BinDatasets(ifnb, split.by = "stim", dims = 1:30, coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")

I've encountered this error in both windows and macos system - it should be easy to be reproduced in this case

smeiersab commented 1 year ago

Hi, I'm getting the same error for

ifnb <- BinDatasets(ifnb, split.by = "stim", dims = 1:30, 
                    coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")

when trying to run the multi-dataset.Rmd vignette. set.seed(1234) didn't help. Thanks!

alonmillet commented 1 year ago

I am running into the same error, both with the vignette and with my own data.

GriffinHon commented 1 year ago

I am having the same issue. I'm unable to reproduce the results with the PBMC dataset using your provided vignette, nor my own data.

decodebiology commented 1 year ago

Hi, I ran into the same issue. I cannot reproduce the results using the vignette for the PBMC dataset. There is an issue with the function 'BinDatasets' which does not produce any extra column 'bins' in the metadata, thus in turn fails consecutive functions/steps.

ZZZhuLF commented 1 year ago

It seems like everyone has encountered this issue, and my situation is exactly the same as the others mentioned above!!! Scriabin is an excellent tool, and I really want to give it a try. Unfortunately, I encountered the same error with both the example ifnb data and my own data, as mentioned by everyone.

ajwilk commented 1 year ago

Hi all, thanks for your patience as I've been slow to respond to issues. I hope to take a deep look at this within the next few days.

ajwilk commented 1 year ago

Still haven't been able to reproduce this issue. Can someone experiencing this issue post a sessionInfo()?

ZZZhuLF commented 1 year ago

@ajwilk R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] tcltk stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] scriabin_0.0.0.9000 panc8.SeuratData_3.0.2 ifnb.SeuratData_3.1.0 SeuratData_0.2.2
[5] qs_0.25.5 cols4all_0.6-1 data.table_1.14.8 Mfuzz_2.58.0
[9] DynDoc_1.76.0 widgetTools_1.76.0 e1071_1.7-13 scales_1.2.1
[13] RColorBrewer_1.1-3 gridExtra_2.3 ggsci_3.0.0 openxlsx_4.2.5.2
[17] plotrix_3.8-2 BuenColors_0.5.6 MASS_7.3-58.1 scater_1.26.1
[21] scuttle_1.8.4 SingleCellExperiment_1.20.1 pheatmap_1.0.12 hgu133plus2.db_3.13.0
[25] DESeq2_1.38.3 SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0 matrixStats_1.0.0
[29] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 clusterProfiler_4.6.2 org.Mm.eg.db_3.16.0
[33] org.Hs.eg.db_3.16.0 AnnotationDbi_1.60.2 IRanges_2.32.0 S4Vectors_0.36.2
[37] mindr_1.3.2 GEOquery_2.66.0 R.utils_2.12.2 R.oo_1.25.0
[41] R.methodsS3_1.8.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[45] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[49] ggplot2_3.4.2 tidyverse_2.0.0 patchwork_1.1.2 dplyr_1.1.2
[53] multtest_2.54.0 Biobase_2.58.0 BiocGenerics_0.44.0 SeuratObject_4.1.3
[57] Seurat_4.3.0.1

loaded via [1] graphlayouts_1.0.0 [5] usethis_2.2.0 [9] prodlim_2023.03.31 [13] rappdirs_0.3.3 [17] htmlwidgets_1.6.2 [21] parallel_4.2.2 [25] KernSmooth_2.23-20 [29] limma_3.54.2 [33] Hmisc_5.1-0 [37] digest_0.6.31 [41] sctransform_0.3.5 [45] ggraph_2.1.0 [49] spatstat.random_3.1-5 [53] iterators_1.0.14 [57] beeswarm_0.4.0 [61] tidyselect_1.2.0 [65] viridisLite_0.4.2 [69] umap_0.2.10.0 [73] recipes_1.0.6 [77] class_7.3-20 [81] jsonlite_1.8.5 [85] mime_0.12 [89] spatstat.sparse_3.0-1 [93] yulab.utils_0.0.6 [97] rhdf5filters_1.10.1 [101] rstudioapi_0.14 [105] locfit_1.5-9.8 [109] gridGraphics_0.5-1 [113] lifecycle_1.0.3 [117] ggalluvial_0.12.5 [121] vipor_0.4.5 [125] xtable_1.8-4 [129] abind_1.4-5 [133] RANN_2.6.1 [137] ggtree_3.6.2 [141] cluster_2.1.4 [145] ellipsis_0.3.2 [149] fgsea_1.24.0 [153] htmltools_0.5.5 [157] XML_3.99-0.14 [161] withr_2.5.0 [165] foreach_1.5.2 [169] GOSemSim_2.24.0 [173] memoise_2.0.1 [177] permute_0.9-7 [181] caret_6.0-94 [185] fansi_1.0.4 [189] cachem_1.0.8 [193] impute_1.72.3 [197] ade4_1.7-22 [201] magrittr_2.0.3 [205] car_3.1-2 [209] httr_1.4.6 [213] Rhdf5lib_1.20.0 [217] treeio_1.22.0 [221] BiocVersion_3.16.0 [225] splines_4.2.2 [229] generics_0.1.3 [233] sp_2.0-0 [237] tkWidgets_1.76.0 [241] ComplexHeatmap_2.14.0 [245] doParallel_1.0.17 [249] backports_1.4.1 [253] hms_1.1.3 [257] polyclip_1.10-4 [261] Formula_1.2-5 [265] sparseMatrixStats_1.10.0 [269] spatstat.geom_3.2-1 a namespace (and not attached): pbapply_1.7-0 lattice_0.20-45 vctrs_0.6.3
mgcv_1.8-41 blob_1.2.4 survival_3.4-0
spatstat.data_3.0-1 later_1.3.1 DBI_1.1.3
uwot_0.1.14 dqrng_0.3.0 zlibbioc_1.44.0
GlobalOptions_0.1.2 future_1.32.0 leiden_0.4.3
irlba_2.3.5.1 tidygraph_1.2.3 Rcpp_1.0.10
DT_0.28 promises_1.2.0.1 DelayedArray_0.24.0
vegan_2.6-4 pkgload_1.3.2 RcppParallel_5.1.7
RSpectra_0.16-1 fs_1.6.2 fastmatch_1.1-3
png_0.1-8 qlcMatrix_0.9.7 coop_0.6-3
scatterpie_0.2.1 cowplot_1.1.1 DOSE_3.24.2
pkgconfig_2.0.3 GO.db_3.16.0 docopt_0.7.1
DelayedMatrixStats_1.20.0 gower_1.0.1 ggbeeswarm_0.7.2
DropletUtils_1.18.1 reticulate_1.30 circlize_0.4.15
GetoptLong_1.0.5 xfun_0.39 zoo_1.8-12
reshape2_1.4.4 ica_1.0-3 gson_0.1.0
pkgbuild_1.4.1 rlang_1.1.1 glue_1.6.2
lava_1.7.2.1 tictoc_1.2 ggsignif_0.6.4
labeling_0.4.2 harmony_0.1.1 httpuv_1.6.11
preprocessCore_1.60.2 BiocNeighbors_1.16.0 annotate_1.76.0
XVector_0.38.0 princurve_2.1.6 bit_4.0.5
gplots_3.1.3 stringi_1.7.12 processx_3.8.1
scattermore_1.2 spatstat.explore_3.2-1 quadprog_1.5-8
hardhat_1.3.0 bitops_1.0-7 cli_3.6.1
RSQLite_2.3.1 randomForest_4.7-1.1 timechange_0.2.0
nlme_3.1-160 qvalue_2.30.0 fastcluster_1.2.3
listenv_0.9.0 FateID_0.2.2 miniUI_0.1.1.1
urlchecker_1.0.1 runner_0.4.3 sessioninfo_1.2.2
timeDate_4022.108 ggfittext_0.10.0 munsell_0.5.0
visNetwork_2.1.2 caTools_1.18.2 codetools_0.2-18
lmtest_0.9-40 msigdbr_7.5.1 htmlTable_2.4.1
ROCR_1.0-11 flashClust_1.01-2 BiocManager_1.30.21
FNN_1.1.3.2 farver_2.1.1 parallelly_1.36.0
aplot_0.1.10 askpass_1.1 sparsesvd_0.2-2
RcppAnnoy_0.0.20 goftest_1.2-3 profvis_0.3.8
future.apply_1.11.0 Matrix_1.5-3 tidytree_0.4.2
prettyunits_1.1.1 ggridges_0.5.4 igraph_1.5.0
remotes_2.4.2 slam_0.1-50 spatstat.utils_3.0-3
yaml_2.3.7 utf8_1.2.3 plotly_4.10.2
ModelMetrics_1.2.2.2 ggpubr_0.6.0 foreign_0.8-83
fitdistrplus_1.1-11 BiocParallel_1.32.6 bit64_4.0.5
RaceID_0.3.0 Biostrings_2.66.0 progressr_0.13.0
rsvd_1.0.5 ScaledMatrix_1.6.0 devtools_2.4.5
evaluate_0.21 RApiSerialize_0.1.2 geneplotter_1.76.0
tzdb_0.4.0 callr_3.7.3 CelliD_1.6.2
ps_1.7.5 DiagrammeR_1.0.10 fdrtool_1.2.17
tensor_1.5 edgeR_3.40.2 checkmate_2.2.0
deldir_1.0-9 HDO.db_0.99.1 babelgene_22.9
rjson_0.2.21 rstatix_0.7.2 ggrepel_0.9.3
clue_0.3-64 tools_4.2.2 nichenetr_1.1.1
RCurl_1.98-1.12 proxy_0.4-27 FSA_0.9.4
ape_5.7-1 xml2_1.3.4 ggplotify_0.1.0
rmarkdown_2.22 globals_0.16.2 R6_2.5.1
nnet_7.3-18 genefilter_1.80.3 KEGGREST_1.38.0
gtools_3.9.4 shape_1.4.6 beachmat_2.14.2
HDF5Array_1.26.0 BiocSingular_1.14.0 rhdf5_2.42.1
carData_3.0-5 ggfun_0.1.0 colorspace_2.1-0
base64enc_0.1-3 pillar_1.9.0 tweenr_2.0.2
GenomeInfoDbData_1.2.9 plyr_1.8.8 gtable_0.3.3
zip_2.3.0 stringfish_0.15.8 knitr_1.43
RcppArmadillo_0.12.4.1.0 shadowtext_0.1.2 fastmap_1.1.1
broom_1.0.5 som_0.3-5.1 openssl_2.0.6
ipred_0.9-14 WGCNA_1.72-1 enrichplot_1.18.4
ggforce_0.4.1 Rtsne_0.16 shiny_1.7.4
grid_4.2.2 lazyeval_0.2.2 dynamicTreeCut_1.63-1
crayon_1.5.2 downloader_0.4 pROC_1.18.2
viridis_0.6.3 rpart_4.1.19 compiler_4.2.2

xuansuyang commented 11 months ago

I am running into the same error, both with the vignette and with my own data.

csangara commented 8 months ago

I think this may be an issue with R versions 4.2.0 and above, specifically due to the change where calling while() or if() statements with a condition of length greater than one gives an error rather than a warning. Example:

# Example code
if (1:2 == 1) { print ("hi") }

# Version 4.1 output
[1] "hi"
Warning message:
In if (1:2 == 1) { :
  the condition has length > 1 and only the first element will be used

# Version 4.2 output
Error in if (1:2 == 1) { : the condition has length > 1
Execution halted

In the example dataset, check_completion returns a vector of length two, which probably caused the error.

> check_completion(nbs_completion)
[1]  TRUE FALSE

I'm not sure what exactly the function checks for, but for now, it runs when I modify check_completion to this:

function (nbs) 
{
  length(unique(nbs$unique_types)) == length(unique(nbs$ident))
}

Tony4709 commented 8 months ago

Hello @ajwilk,

I've spent a lot of time trying to modify the function BinDatasets, and I think there are some notable errors in it, such as the one mentioned above

check_completion <- function(nbs) {
  unique(nbs$unique_types)==length(unique(nbs$ident))
}

should be (I guess) @csangara

check_completion <- function(nbs) {
  nbs$unique_types == length(unique(nbs$ident))
}

And another apparent irrationality is

seu_ct_split <- SplitObject(seu, split.by = coarse_cell_types)
    bin_ids <- lapply(seq_along(1:length(seu_ct_split)), function(x) {
      seu_oi <- AlignDatasets(seuObj = seu,
                              dims = dims,
                              anchor_score_threshold = anchor_score_threshold,
                              split.by = split.by,
                              optim_quan.threshold = optim_quan.threshold,
                              optim_k.unique = optim_k.unique,
                              snn.reduction = snn.reduction, verbose = verbose)

This code first splits the "seu", but the object looped below in the lapply is still the "seu". I think it should be:

seu_ct_split <- SplitObject(seu, split.by = coarse_cell_types)
    bin_ids <- lapply(seq_along(1:length(seu_ct_split)), function(x) {
      seu_oi <- AlignDatasets(seuObj = seu_ct_split[[x]],
                              dims = dims,
                              anchor_score_threshold = anchor_score_threshold,
                              split.by = split.by,
                              optim_quan.threshold = optim_quan.threshold,
                              optim_k.unique = optim_k.unique,
                              snn.reduction = snn.reduction, verbose = verbose)

However, even though I fixed the bug I found in BinDatasets, that I can give bin's name to a dataset, the workflow in vignettes for multiple datasets is still incomplete. This vignettes only ran to how to construct the matrix of graph interactions, and did not run the PerturbedBins function in it to do the kruskal test.

In addition, I think parallel computing is essential in a workflow with such a large matrix.

What puzzles me most is that even if I get bin-bin pairs with significant interaction changes, then what? How do I need to get the cells I put into the single data vignette? I'm guessing that since I'm filtering single cells based on significant bin-bin pairs, the final construction of the cell communication matrix at single-cell resolution is also limited to communication in those two bins. However this is not reflected in the code or the article.

All in all, I think the views and ideas in this article are very original, but your code on GitHub won't run well, it doesn't look like a final version. At least the work in the vignette has to run successfully. I sincerely hope that you will take a little time to refine the code here so that more people will cite your analysis methods and articles.

Sincerely, Tony4709

csangara commented 8 months ago

Hi @Tony4709,

For your fix of check_completion, I think you might run into the same problem as before, since nbs_completion is a dataframe that looks like this:

# A tibble: 252 × 4
# Groups:   id [5]
   cell            id    ident unique_types
   <chr>           <chr> <chr>        <int>
 1 project=CTRL.1  88    CTRL             2
 2 project=CTRL.2  29    CTRL             2
 3 project=CTRL.3  88    CTRL             2
 4 project=CTRL.4  29    CTRL             2
 5 project=CTRL.5  29    CTRL             2
 6 project=CTRL.6  29    CTRL             2
 7 project=CTRL.7  88    CTRL             2
 8 project=CTRL.8  29    CTRL             2
 9 project=CTRL.9  88    CTRL             2
10 project=CTRL.10 29    CTRL             2
# ℹ 242 more rows

So nbs$unique_types == length(unique(nbs$ident)) will return a vector of length nrow(nbs). So it should probably be something like all(nbs$unique_types == length(unique(nbs$ident)))?

I was also thinking of changing seu to seu_ct_split[[x]], as I ran into the same confusion when reading the code. I managed to get the multi-dataset vignette running to the end without fixing this, but it took several hours. I'll try to implement this change and see if the runtime will be reduced.

Furthermore, I just saw that @inofechm actually has a fork that fixes the things we mentioned here (and more), which I might also try out.

I agree that the vignette is very much incomplete. I tried running PerturbedBins but it also didn't work, not sure if it's because >2 samples are needed as stated in the description. I think it might not work either way, as the bin combinations in the code are coded as bin_combos <- expand.grid(1:nbin,1:nbin), but the bin numbers that are returned from BinDatasets do not seem to be numbered chronologically.

I hope the code and vignette gets some fixes soon (or at the very least, please remove this browser call), as I would really love to properly use this tool.

BlishLab / scriabin

Error in Dataset binning #7

I've encountered this error in both windows and macos system - it should be easy to be reproduced in this case