Open AIBio opened 1 year ago
Hi Hanwen,
Thanks for your patience. Can you confirm if you are running through this vignette? If so, I cannot reproduce your error. Can you try with set.seed(1234)
?
Same issue for me. No idea how it could be solved.
Error in if (check_completion(nbs_completion)) { : the condition has length > 1 4. AlignDatasets(seuObj = seu, dims = dims, anchor_score_threshold = anchor_score_threshold, split.by = split.by, optim_quan.threshold = optim_quan.threshold, optim_k.unique = optim_k.unique, snn.reduction = snn.reduction, verbose = verbose) 3. FUN(X[[i]], ...) 2. lapply(seq_along(1:length(seu_ct_split)), function(x) { seu_oi <- AlignDatasets(seuObj = seu, dims = dims, anchor_score_threshold = anchor_score_threshold, split.by = split.by, optim_quan.threshold = optim_quan.threshold, optim_k.unique = optim_k.unique, snn.reduction = snn.reduction, ... 1. BinDatasets(ifnb, split.by = "stim", dims = 1:30, coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")
Hi, I'm getting the same error for
ifnb <- BinDatasets(ifnb, split.by = "stim", dims = 1:30,
coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")
when trying to run the multi-dataset.Rmd vignette. set.seed(1234) didn't help. Thanks!
I am running into the same error, both with the vignette and with my own data.
I am having the same issue. I'm unable to reproduce the results with the PBMC dataset using your provided vignette, nor my own data.
Hi, I ran into the same issue. I cannot reproduce the results using the vignette for the PBMC dataset. There is an issue with the function 'BinDatasets
' which does not produce any extra column 'bins' in the metadata, thus in turn fails consecutive functions/steps.
It seems like everyone has encountered this issue, and my situation is exactly the same as the others mentioned above!!! Scriabin is an excellent tool, and I really want to give it a try. Unfortunately, I encountered the same error with both the example ifnb data and my own data, as mentioned by everyone.
Hi all, thanks for your patience as I've been slow to respond to issues. I hope to take a deep look at this within the next few days.
Still haven't been able to reproduce this issue. Can someone experiencing this issue post a sessionInfo()
?
@ajwilk R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] tcltk stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scriabin_0.0.0.9000 panc8.SeuratData_3.0.2 ifnb.SeuratData_3.1.0 SeuratData_0.2.2
[5] qs_0.25.5 cols4all_0.6-1 data.table_1.14.8 Mfuzz_2.58.0
[9] DynDoc_1.76.0 widgetTools_1.76.0 e1071_1.7-13 scales_1.2.1
[13] RColorBrewer_1.1-3 gridExtra_2.3 ggsci_3.0.0 openxlsx_4.2.5.2
[17] plotrix_3.8-2 BuenColors_0.5.6 MASS_7.3-58.1 scater_1.26.1
[21] scuttle_1.8.4 SingleCellExperiment_1.20.1 pheatmap_1.0.12 hgu133plus2.db_3.13.0
[25] DESeq2_1.38.3 SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0 matrixStats_1.0.0
[29] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 clusterProfiler_4.6.2 org.Mm.eg.db_3.16.0
[33] org.Hs.eg.db_3.16.0 AnnotationDbi_1.60.2 IRanges_2.32.0 S4Vectors_0.36.2
[37] mindr_1.3.2 GEOquery_2.66.0 R.utils_2.12.2 R.oo_1.25.0
[41] R.methodsS3_1.8.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[45] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[49] ggplot2_3.4.2 tidyverse_2.0.0 patchwork_1.1.2 dplyr_1.1.2
[53] multtest_2.54.0 Biobase_2.58.0 BiocGenerics_0.44.0 SeuratObject_4.1.3
[57] Seurat_4.3.0.1
loaded via a namespace (and not attached):
[1] graphlayouts_1.0.0 pbapply_1.7-0 lattice_0.20-45 vctrs_0.6.3
[5] usethis_2.2.0 mgcv_1.8-41 blob_1.2.4 survival_3.4-0
[9] prodlim_2023.03.31 spatstat.data_3.0-1 later_1.3.1 DBI_1.1.3
[13] rappdirs_0.3.3 uwot_0.1.14 dqrng_0.3.0 zlibbioc_1.44.0
[17] htmlwidgets_1.6.2 GlobalOptions_0.1.2 future_1.32.0 leiden_0.4.3
[21] parallel_4.2.2 irlba_2.3.5.1 tidygraph_1.2.3 Rcpp_1.0.10
[25] KernSmooth_2.23-20 DT_0.28 promises_1.2.0.1 DelayedArray_0.24.0
[29] limma_3.54.2 vegan_2.6-4 pkgload_1.3.2 RcppParallel_5.1.7
[33] Hmisc_5.1-0 RSpectra_0.16-1 fs_1.6.2 fastmatch_1.1-3
[37] digest_0.6.31 png_0.1-8 qlcMatrix_0.9.7 coop_0.6-3
[41] sctransform_0.3.5 scatterpie_0.2.1 cowplot_1.1.1 DOSE_3.24.2
[45] ggraph_2.1.0 pkgconfig_2.0.3 GO.db_3.16.0 docopt_0.7.1
[49] spatstat.random_3.1-5 DelayedMatrixStats_1.20.0 gower_1.0.1 ggbeeswarm_0.7.2
[53] iterators_1.0.14 DropletUtils_1.18.1 reticulate_1.30 circlize_0.4.15
[57] beeswarm_0.4.0 GetoptLong_1.0.5 xfun_0.39 zoo_1.8-12
[61] tidyselect_1.2.0 reshape2_1.4.4 ica_1.0-3 gson_0.1.0
[65] viridisLite_0.4.2 pkgbuild_1.4.1 rlang_1.1.1 glue_1.6.2
[69] umap_0.2.10.0 lava_1.7.2.1 tictoc_1.2 ggsignif_0.6.4
[73] recipes_1.0.6 labeling_0.4.2 harmony_0.1.1 httpuv_1.6.11
[77] class_7.3-20 preprocessCore_1.60.2 BiocNeighbors_1.16.0 annotate_1.76.0
[81] jsonlite_1.8.5 XVector_0.38.0 princurve_2.1.6 bit_4.0.5
[85] mime_0.12 gplots_3.1.3 stringi_1.7.12 processx_3.8.1
[89] spatstat.sparse_3.0-1 scattermore_1.2 spatstat.explore_3.2-1 quadprog_1.5-8
[93] yulab.utils_0.0.6 hardhat_1.3.0 bitops_1.0-7 cli_3.6.1
[97] rhdf5filters_1.10.1 RSQLite_2.3.1 randomForest_4.7-1.1 timechange_0.2.0
[101] rstudioapi_0.14 nlme_3.1-160 qvalue_2.30.0 fastcluster_1.2.3
[105] locfit_1.5-9.8 listenv_0.9.0 FateID_0.2.2 miniUI_0.1.1.1
[109] gridGraphics_0.5-1 urlchecker_1.0.1 runner_0.4.3 sessioninfo_1.2.2
[113] lifecycle_1.0.3 timeDate_4022.108 ggfittext_0.10.0 munsell_0.5.0
[117] ggalluvial_0.12.5 visNetwork_2.1.2 caTools_1.18.2 codetools_0.2-18
[121] vipor_0.4.5 lmtest_0.9-40 msigdbr_7.5.1 htmlTable_2.4.1
[125] xtable_1.8-4 ROCR_1.0-11 flashClust_1.01-2 BiocManager_1.30.21
[129] abind_1.4-5 FNN_1.1.3.2 farver_2.1.1 parallelly_1.36.0
[133] RANN_2.6.1 aplot_0.1.10 askpass_1.1 sparsesvd_0.2-2
[137] ggtree_3.6.2 RcppAnnoy_0.0.20 goftest_1.2-3 profvis_0.3.8
[141] cluster_2.1.4 future.apply_1.11.0 Matrix_1.5-3 tidytree_0.4.2
[145] ellipsis_0.3.2 prettyunits_1.1.1 ggridges_0.5.4 igraph_1.5.0
[149] fgsea_1.24.0 remotes_2.4.2 slam_0.1-50 spatstat.utils_3.0-3
[153] htmltools_0.5.5 yaml_2.3.7 utf8_1.2.3 plotly_4.10.2
[157] XML_3.99-0.14 ModelMetrics_1.2.2.2 ggpubr_0.6.0 foreign_0.8-83
[161] withr_2.5.0 fitdistrplus_1.1-11 BiocParallel_1.32.6 bit64_4.0.5
[165] foreach_1.5.2 RaceID_0.3.0 Biostrings_2.66.0 progressr_0.13.0
[169] GOSemSim_2.24.0 rsvd_1.0.5 ScaledMatrix_1.6.0 devtools_2.4.5
[173] memoise_2.0.1 evaluate_0.21 RApiSerialize_0.1.2 geneplotter_1.76.0
[177] permute_0.9-7 tzdb_0.4.0 callr_3.7.3 CelliD_1.6.2
[181] caret_6.0-94 ps_1.7.5 DiagrammeR_1.0.10 fdrtool_1.2.17
[185] fansi_1.0.4 tensor_1.5 edgeR_3.40.2 checkmate_2.2.0
[189] cachem_1.0.8 deldir_1.0-9 HDO.db_0.99.1 babelgene_22.9
[193] impute_1.72.3 rjson_0.2.21 rstatix_0.7.2 ggrepel_0.9.3
[197] ade4_1.7-22 clue_0.3-64 tools_4.2.2 nichenetr_1.1.1
[201] magrittr_2.0.3 RCurl_1.98-1.12 proxy_0.4-27 FSA_0.9.4
[205] car_3.1-2 ape_5.7-1 xml2_1.3.4 ggplotify_0.1.0
[209] httr_1.4.6 rmarkdown_2.22 globals_0.16.2 R6_2.5.1
[213] Rhdf5lib_1.20.0 nnet_7.3-18 genefilter_1.80.3 KEGGREST_1.38.0
[217] treeio_1.22.0 gtools_3.9.4 shape_1.4.6 beachmat_2.14.2
[221] BiocVersion_3.16.0 HDF5Array_1.26.0 BiocSingular_1.14.0 rhdf5_2.42.1
[225] splines_4.2.2 carData_3.0-5 ggfun_0.1.0 colorspace_2.1-0
[229] generics_0.1.3 base64enc_0.1-3 pillar_1.9.0 tweenr_2.0.2
[233] sp_2.0-0 GenomeInfoDbData_1.2.9 plyr_1.8.8 gtable_0.3.3
[237] tkWidgets_1.76.0 zip_2.3.0 stringfish_0.15.8 knitr_1.43
[241] ComplexHeatmap_2.14.0 RcppArmadillo_0.12.4.1.0 shadowtext_0.1.2 fastmap_1.1.1
[245] doParallel_1.0.17 broom_1.0.5 som_0.3-5.1 openssl_2.0.6
[249] backports_1.4.1 ipred_0.9-14 WGCNA_1.72-1 enrichplot_1.18.4
[253] hms_1.1.3 ggforce_0.4.1 Rtsne_0.16 shiny_1.7.4
[257] polyclip_1.10-4 grid_4.2.2 lazyeval_0.2.2 dynamicTreeCut_1.63-1
[261] Formula_1.2-5 crayon_1.5.2 downloader_0.4 pROC_1.18.2
[265] sparseMatrixStats_1.10.0 viridis_0.6.3 rpart_4.1.19 compiler_4.2.2
[269] spatstat.geom_3.2-1
I am running into the same error, both with the vignette and with my own data.
I think this may be an issue with R versions 4.2.0 and above, specifically due to the change where calling while()
or if()
statements with a condition of length greater than one gives an error rather than a warning. Example:
# Example code
if (1:2 == 1) { print ("hi") }
# Version 4.1 output
[1] "hi"
Warning message:
In if (1:2 == 1) { :
the condition has length > 1 and only the first element will be used
# Version 4.2 output
Error in if (1:2 == 1) { : the condition has length > 1
Execution halted
In the example dataset, check_completion
returns a vector of length two, which probably caused the error.
> check_completion(nbs_completion)
[1] TRUE FALSE
I'm not sure what exactly the function checks for, but for now, it runs when I modify check_completion
to this:
function (nbs)
{
length(unique(nbs$unique_types)) == length(unique(nbs$ident))
}
Hello @ajwilk,
I've spent a lot of time trying to modify the function BinDatasets, and I think there are some notable errors in it, such as the one mentioned above
check_completion <- function(nbs) {
unique(nbs$unique_types)==length(unique(nbs$ident))
}
should be (I guess) @csangara
check_completion <- function(nbs) {
nbs$unique_types == length(unique(nbs$ident))
}
And another apparent irrationality is
seu_ct_split <- SplitObject(seu, split.by = coarse_cell_types)
bin_ids <- lapply(seq_along(1:length(seu_ct_split)), function(x) {
seu_oi <- AlignDatasets(seuObj = seu,
dims = dims,
anchor_score_threshold = anchor_score_threshold,
split.by = split.by,
optim_quan.threshold = optim_quan.threshold,
optim_k.unique = optim_k.unique,
snn.reduction = snn.reduction, verbose = verbose)
This code first splits the "seu", but the object looped below in the lapply is still the "seu". I think it should be:
seu_ct_split <- SplitObject(seu, split.by = coarse_cell_types)
bin_ids <- lapply(seq_along(1:length(seu_ct_split)), function(x) {
seu_oi <- AlignDatasets(seuObj = seu_ct_split[[x]],
dims = dims,
anchor_score_threshold = anchor_score_threshold,
split.by = split.by,
optim_quan.threshold = optim_quan.threshold,
optim_k.unique = optim_k.unique,
snn.reduction = snn.reduction, verbose = verbose)
However, even though I fixed the bug I found in BinDatasets, that I can give bin's name to a dataset, the workflow in vignettes for multiple datasets is still incomplete. This vignettes only ran to how to construct the matrix of graph interactions, and did not run the PerturbedBins function in it to do the kruskal test.
In addition, I think parallel computing is essential in a workflow with such a large matrix.
What puzzles me most is that even if I get bin-bin pairs with significant interaction changes, then what? How do I need to get the cells I put into the single data vignette? I'm guessing that since I'm filtering single cells based on significant bin-bin pairs, the final construction of the cell communication matrix at single-cell resolution is also limited to communication in those two bins. However this is not reflected in the code or the article.
All in all, I think the views and ideas in this article are very original, but your code on GitHub won't run well, it doesn't look like a final version. At least the work in the vignette has to run successfully. I sincerely hope that you will take a little time to refine the code here so that more people will cite your analysis methods and articles.
Sincerely, Tony4709
Hi @Tony4709,
For your fix of check_completion
, I think you might run into the same problem as before, since nbs_completion
is a dataframe that looks like this:
# A tibble: 252 × 4
# Groups: id [5]
cell id ident unique_types
<chr> <chr> <chr> <int>
1 project=CTRL.1 88 CTRL 2
2 project=CTRL.2 29 CTRL 2
3 project=CTRL.3 88 CTRL 2
4 project=CTRL.4 29 CTRL 2
5 project=CTRL.5 29 CTRL 2
6 project=CTRL.6 29 CTRL 2
7 project=CTRL.7 88 CTRL 2
8 project=CTRL.8 29 CTRL 2
9 project=CTRL.9 88 CTRL 2
10 project=CTRL.10 29 CTRL 2
# ℹ 242 more rows
So nbs$unique_types == length(unique(nbs$ident))
will return a vector of length nrow(nbs)
. So it should probably be something like all(nbs$unique_types == length(unique(nbs$ident)))
?
I was also thinking of changing seu
to seu_ct_split[[x]]
, as I ran into the same confusion when reading the code. I managed to get the multi-dataset vignette running to the end without fixing this, but it took several hours. I'll try to implement this change and see if the runtime will be reduced.
Furthermore, I just saw that @inofechm actually has a fork that fixes the things we mentioned here (and more), which I might also try out.
I agree that the vignette is very much incomplete. I tried running PerturbedBins
but it also didn't work, not sure if it's because >2 samples are needed as stated in the description. I think it might not work either way, as the bin combinations in the code are coded as bin_combos <- expand.grid(1:nbin,1:nbin)
, but the bin numbers that are returned from BinDatasets
do not seem to be numbered chronologically.
I hope the code and vignette gets some fixes soon (or at the very least, please remove this browser
call), as I would really love to properly use this tool.
Hi,
This toolkit is excellent! I try to re-run the code in "Scriabin's summarized interaction graph workflow". But I met an error during binning dataset. Here is output:
Could you help me to figure it out?
Hanwen