HectorRDB / condiments

Trajectory inference across multiple conditions with condiments: differential topology, progression, differentiation, and expression
https://hectorrdb.github.io/condiments/
Other
16 stars 1 forks source link

topologyTest() returns an error when running Classifier test #16

Open cgoneill opened 2 years ago

cgoneill commented 2 years ago

Hello, and thank you for your excellent work on condiments. I'm following along with the KRAS analysis vignette you outlined on the condimentsPaper repository on my four-condition dataset that I converted to a SingleCellExperiment object from a Seurat object, but when I get to topologyTest(), I get the following error:

> topologyTest(unwounded.ife.sce, conditions = "project_char")
Generating permuted trajectories
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13m 03s
Running Classifier test
Error in if (!check) { : argument is of length zero

I've tried getting the character vector to pass to conditions a few ways but I keep running into that error. My understanding from the documentation and the KRAS vignette is that I can pass a character to condition that tells topologyTest() which column of the metadata has the vector, but even if I isolate the vector containing that information and pass it separately, it returns the same error.

I've tried running topologyTest() with a different column in the metadata (just the treatment as opposed to the disease state, which is a binary) to see if the issue is specific to the Classifier test, and it worked when using the KS test on a two-condition metadata column:

> topologyTest(unwounded.ife.sce, conditions = "treatment", rep = 25)
Generating permuted trajectories
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04m 00s
Running KS-mean test
   method thresh  statistic     p.value
1 KS_mean   0.01 0.02305099 0.000135764

I take this to mean the KS test would work if I ran it on the treatment and disease state conditions separately, but not on the unified metadata column.

Is there something missing from my use of the function with the Classifier test?

Here's the session info:

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.12             RColorBrewer_1.1-3          scater_1.24.0               scuttle_1.6.2               condiments_1.4.0            tradeSeq_1.10.0            
 [7] slingshot_2.4.0             TrajectoryUtils_1.4.0       princurve_2.1.6             enrichplot_1.16.1           org.Mm.eg.db_3.15.0         AnnotationDbi_1.58.0       
[13] clusterProfiler_4.4.2       enrichR_3.0                 clustree_0.4.4              ggraph_2.0.5                scCustomize_0.7.0           Scillus_0.5.0              
[19] harmony_0.1.0               Rcpp_1.0.8.3                DoubletFinder_2.0.3         DropletUtils_1.16.0         SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
[25] Biobase_2.56.0              GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0              S4Vectors_0.34.0            BiocGenerics_0.42.0        
[31] MatrixGenerics_1.8.0        matrixStats_0.62.0          openxlsx_4.2.5              cowplot_1.1.1               magrittr_2.0.3              forcats_0.5.1              
[37] stringr_1.4.0               dplyr_1.0.9                 purrr_0.3.4                 readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.7               
[43] ggplot2_3.3.6               tidyverse_1.3.1             Matrix_1.4-1                SoupX_1.6.1                 sp_1.5-0                    SeuratObject_4.1.0         
[49] Seurat_4.1.1               

loaded via a namespace (and not attached):
  [1] rsvd_1.0.5                ica_1.0-2                 ps_1.7.0                  class_7.3-20              rprojroot_2.0.3           foreach_1.5.2             lmtest_0.9-40            
  [8] crayon_1.5.1              spatstat.core_2.4-4       MASS_7.3-57               rhdf5filters_1.8.0        dittoSeq_1.9.1            distinct_1.8.0            nlme_3.1-157             
 [15] backports_1.4.1           reprex_2.0.1              transport_0.12-2          GOSemSim_2.22.0           rlang_1.0.2               XVector_0.36.0            caret_6.0-92             
 [22] ROCR_1.0-11               readxl_1.4.0              irlba_2.3.5               callr_3.7.0               limma_3.52.1              BiocParallel_1.30.3       rjson_0.2.21             
 [29] bit64_4.0.5               glue_1.6.2                rngtools_1.5.2            sctransform_0.3.3         processx_3.6.1            parallel_4.2.0            vipor_0.4.5              
 [36] spatstat.sparse_2.1-1     DOSE_3.22.0               spatstat.geom_2.4-0       haven_2.5.0               tidyselect_1.1.2          fitdistrplus_1.1-8        zoo_1.8-10               
 [43] ggpubr_0.4.0              xtable_1.8-4              formattable_0.2.1         evaluate_0.15             cli_3.3.0                 zlibbioc_1.42.0           doRNG_1.8.2              
 [50] rstudioapi_0.13           miniUI_0.1.1.1            spatstat.linnet_2.3-2     rpart_4.1.16              fastmatch_1.1-3           treeio_1.20.0             shiny_1.7.1              
 [57] BiocSingular_1.12.0       xfun_0.31                 clue_0.3-61               pkgbuild_1.3.1            cluster_2.1.3             tidygraph_1.2.1           KEGGREST_1.36.0          
 [64] ggrepel_0.9.1             ape_5.6-2                 listenv_0.8.0             Biostrings_2.64.0         png_0.1-7                 future_1.26.1             ipred_0.9-13             
 [71] withr_2.5.0               bitops_1.0-7              ggforce_0.3.3             plyr_1.8.7                cellranger_1.1.0          hardhat_1.1.0             e1071_1.7-11             
 [78] pROC_1.18.0               dqrng_0.3.0               pillar_1.7.0              GlobalOptions_0.1.2       cachem_1.0.6              Ecume_0.9.1               fs_1.5.2                 
 [85] kernlab_0.9-30            GetoptLong_1.0.5          paletteer_1.4.0           DelayedMatrixStats_1.18.0 vctrs_0.4.1               ellipsis_0.3.2            generics_0.1.2           
 [92] lava_1.6.10               tools_4.2.0               beeswarm_0.4.0            munsell_0.5.0             tweenr_1.0.2              proxy_0.4-27              fgsea_1.22.0             
 [99] DelayedArray_0.22.0       fastmap_1.1.0             compiler_4.2.0            abind_1.4-5               httpuv_1.6.5              plotly_4.10.0             rgeos_0.5-9              
[106] prodlim_2019.11.13        GenomeInfoDbData_1.2.8    gridExtra_2.3             edgeR_3.38.1              colorway_0.2.0            lattice_0.20-45           deldir_1.0-6             
[113] utf8_1.2.2                later_1.3.0               recipes_0.2.0             jsonlite_1.8.0            scales_1.2.0              ScaledMatrix_1.4.0        tidytree_0.3.9           
[120] pbapply_1.5-0             carData_3.0-5             sparseMatrixStats_1.8.0   lazyeval_0.2.2            promises_1.2.0.1          spatstat_2.3-4            car_3.1-0                
[127] doParallel_1.0.17         R.utils_2.11.0            goftest_1.2-3             spatstat.utils_2.3-1      reticulate_1.25           rmarkdown_2.14            glmGamPoi_1.8.0          
[134] Rtsne_0.16                downloader_0.4            uwot_0.1.11               igraph_1.3.2              HDF5Array_1.24.1          survival_3.3-1            htmltools_0.5.2          
[141] memoise_2.0.1             locfit_1.5-9.5            graphlayouts_0.8.0        viridisLite_0.4.0         digest_0.6.29             assertthat_0.2.1          mime_0.12                
[148] RSQLite_2.2.14            yulab.utils_0.0.4         future.apply_1.9.0        remotes_2.4.2             data.table_1.14.2         blob_1.2.3                R.oo_1.24.0              
[155] labeling_0.4.2            splines_4.2.0             rematch2_2.1.2            Rhdf5lib_1.18.2           RCurl_1.98-1.7            broom_0.8.0               hms_1.1.1                
[162] modelr_0.1.8              rhdf5_2.40.0              colorspace_2.0-3          ggbeeswarm_0.6.0          shape_1.4.6               aplot_0.1.6               nnet_7.3-17              
[169] RANN_2.6.1                circlize_0.4.15           fansi_1.0.3               tzdb_0.3.0                ModelMetrics_1.2.2.2      parallelly_1.32.0         R6_2.5.1                 
[176] grid_4.2.0                ggridges_0.5.3            lifecycle_1.0.1           zip_2.2.0                 curl_4.3.2                ggsignif_0.6.3            leiden_0.4.2             
[183] snakecase_0.11.0          DO.db_2.9                 qvalue_2.28.0             RcppAnnoy_0.0.19          iterators_1.0.14          gower_1.0.0               htmlwidgets_1.5.4        
[190] beachmat_2.12.0           polyclip_1.10-0           shadowtext_0.1.2          gridGraphics_0.5-1        rvest_1.0.2               ComplexHeatmap_2.12.0     mgcv_1.8-40              
[197] globals_0.15.0            patchwork_1.1.1           spatstat.random_2.2-0     progressr_0.10.1          codetools_0.2-18          lubridate_1.8.0           GO.db_3.15.0             
[204] prettyunits_1.1.1         dbplyr_2.2.0              RSpectra_0.16-1           R.methodsS3_1.8.1         gtable_0.3.0              DBI_1.1.2                 ggfun_0.0.6              
[211] tensor_1.5                httr_1.4.3                KernSmooth_2.23-20        stringi_1.7.6             reshape2_1.4.4            farver_2.1.0              viridis_0.6.2            
[218] timeDate_3043.102         ggtree_3.4.0              xml2_1.3.3                BiocNeighbors_1.14.0      ggplotify_0.1.0           scattermore_0.8           bit_4.0.4                
[225] scatterpie_0.1.7          spatstat.data_2.2-0       janitor_2.1.0             pkgconfig_2.0.3           ggprism_1.0.3.9000        rstatix_0.7.0             knitr_1.39       
HectorRDB commented 2 years ago

Hi, Thanks for using condiments and for providing such a comprehensive report. Could you please show me that the project_char column looks like? Thanks

cgoneill commented 2 years ago

Thanks for your response. Here's the brief overview of the column:

> summary(unwounded.ife.sce$project_char)
   Length     Class      Mode 
    18063 character character 
> head(unwounded.ife.sce$project_char)
[1] "vehicle nondiabetic unwounded" "vehicle nondiabetic unwounded" "vehicle nondiabetic unwounded" "vehicle nondiabetic unwounded" "vehicle nondiabetic unwounded" "vehicle nondiabetic unwounded"
> unwounded.ife.sce$project_char %>% as.factor() %>% levels()
[1] "tamoxifen diabetic unwounded"    "tamoxifen nondiabetic unwounded" "vehicle diabetic unwounded"      "vehicle nondiabetic unwounded" 

I tried to run topologyTest() again with a metadata column using _ as a separator in place of , but I encountered the same error message, so I don't think that's the problem.

HectorRDB commented 2 years ago

Ok I don't see anything weird. Can you give me the traceback of the error?

cgoneill commented 2 years ago

I set reps = 10 for this iteration to get a quicker reproduction of the problem.

> traceback()
17: .check_d(x, y)
16: (function (x, y, split = 0.7, thresh = 0, method = "knn", control = caret::trainControl(method = "cv"), 
        ...) 
    {
        if ("list" %in% methods::is(x)) {
            .check_d(x)
            names(x) <- paste0("C", seq_along(x))
            x <- lapply(x, as.data.frame)
            X <- dplyr::bind_rows(x, .id = "type")
            X$type <- as.factor(X$type)
        }
        else {
            .check_d(x, y)
            X <- dplyr::bind_rows(C1 = as.data.frame(x), C2 = as.data.frame(y), 
                .id = "type")
            X$type <- as.factor(X$type)
        }
        min_size <- min(table(X$type))
        X <- X %>% dplyr::group_by(type) %>% dplyr::slice_sample(n = min_size) %>% 
            dplyr::ungroup()
        training_set <- createDataPartition(X$type, p = split)
     ...
15: do.call(what = Ecume::classifier_test, args = args)
14: FUN(X[[i]], ...)
13: lapply(threshs, function(thresh) {
        args <- args_classifier
        args$x <- as.matrix(og$psts)
        args$y <- psts
        args$thresh <- thresh
        test <- do.call(what = Ecume::classifier_test, args = args)
        return(data.frame(statistic = test$statistic, p.value = test$p.value))
    })
12: .topologyTest_classifier(permutations, og, threshs, sds, rep, 
        args_classifier)
11: .topologyTest_all_selected(curves$permutations, curves$og, conditions, 
        sds, methods, threshs, rep, args_classifier, args_mmd, args_wass, 
        distinct_samples)
10: list2(...)
9: bind_rows(., .id = "method")
8: .topologyTest_all_selected(curves$permutations, curves$og, conditions, 
       sds, methods, threshs, rep, args_classifier, args_mmd, args_wass, 
       distinct_samples) %>% bind_rows(.id = "method")
7: .topologyTest(sds = sds, conditions = conditions, rep = rep, 
       threshs = threshs, methods = methods, parallel = parallel, 
       BPPARAM = BPPARAM, args_mmd = args_mmd, args_wass = args_wass, 
       args_classifier = args_classifier, nmax = nmax, distinct_samples = distinct_samples)
6: .local(sds, ...)
5: topologyTest(slingshot::SlingshotDataSet(sds), conditions = conditions, 
       rep = rep, threshs = threshs, methods = methods, parallel = parallel, 
       BPPARAM = BPPARAM, args_mmd = args_mmd, args_wass = args_wass, 
       args_classifier = args_classifier, nmax = nmax, distinct_samples = distinct_samples)
4: topologyTest(slingshot::SlingshotDataSet(sds), conditions = conditions, 
       rep = rep, threshs = threshs, methods = methods, parallel = parallel, 
       BPPARAM = BPPARAM, args_mmd = args_mmd, args_wass = args_wass, 
       args_classifier = args_classifier, nmax = nmax, distinct_samples = distinct_samples)
3: .local(sds, ...)
2: topologyTest(unwounded.ife.sce, conditions = "project_char", 
       rep = 10)
1: topologyTest(unwounded.ife.sce, conditions = "project_char", 
       rep = 10)
HectorRDB commented 1 year ago

Ok, I cannot reproduce that issue. Would you mind emailing me the data at hector dot rouxdebezieux at gmail dot com ? Thanks

RicardoMartins-Ferreira commented 1 year ago

Hello. I have been getting exactly the same error mentioned in this issue. Did you get to a conclusio on what it was?

Best regards

HectorRDB commented 7 months ago

Sorry no I have not.

mahwish2 commented 3 months ago

Hi, any updates here? I have the same issue with three conditions.

darshanaka commented 1 month ago

Hi, not sure if this error was ever fixed, but I am facing the same error with 4 conditions..

HectorRDB commented 1 month ago

Ok I now have some more bandwidth to consider. The issue has not been fixed. Can someone share their data and I will work on this issue. Thanks