caravagnalab / revolver

REVOLVER - Repeated Evolution in Cancer
https://caravagnalab.github.io/revolver/
64 stars 25 forks source link

Error: Can't subset columns that don't exist #31

Closed pdpz1 closed 3 years ago

pdpz1 commented 4 years ago

Hi,

my apologies for bothering you with such a trivial error but I can't seem to run the function plot_clusters() properly. I get the error:

plot_clusters(my_cohort_jackknifed) New names:

  • -14q -> -14q...9
  • -17p -> -17p...10
  • -3p21 -> -3p21...11
  • -6q -> -6q...12
  • -chr4 -> -chr4...13
  • ... Error: Can't subset columns that don't exist. x Columns -14q, -17p, -3p21, -6q, -chr4, etc. don't exist. Run rlang::last_error() to see where the error occurred. rlang::last_error() <error/vctrs_error_subscript_oob> Can't subset columns that don't exist. x Columns -14q, -17p, -3p21, -6q, -chr4, etc. don't exist. Backtrace:
    1. revolver::plot_clusters(my_cohort_jackknifed)
    2. vctrs:::stop_subscript_oob(...)
    3. vctrs:::stop_subscript(...) Run rlang::last_trace() to see the full context. rlang::last_trace() <error/vctrs_error_subscript_oob> Can't subset columns that don't exist. x Columns -14q, -17p, -3p21, -6q, -chr4, etc. don't exist. Backtrace: x
    4. +-revolver::plot_clusters(my_cohort_jackknifed)
    5. | -revolver::get_features(x)
    6. | +-%>%(...)
    7. | | -base::eval(lhs, parent, parent)
    8. | | -base::eval(lhs, parent, parent)
    9. | -revolver:::complement(Matrix_drivers, Matrix_subclonal_drivers)
    10. | +-N[, colnames(M)]
    11. | -tibble:::[.tbl_df(N, , colnames(M))
    12. | -tibble:::tbl_subset_col(x, j = j, j_arg)
    13. | -tibble:::vectbl_as_col_index(j, x, j_arg = j_arg)
    14. | -tibble:::vectbl_as_col_location(...)
    15. | +-tibble:::subclass_col_index_errors(...)
    16. | | +-base::tryCatch(...)
    17. | | | -base:::tryCatchList(expr, classes, parentenv, handlers)
    18. | | | -base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
    19. | | | -base:::doTryCatch(return(expr), name, parentenv, handler)
    20. | | -base::force(expr)
    21. | -vctrs::vec_as_location(j, n, names, arg = as_label(j_arg))
    22. -vctrs:::stop_subscript_oob(...)
    23. -vctrs:::stop_subscript(...)

All other functions and plots work as intended and produce outputs. Being inexperienced in coding, unfortunately i'm somewhat lost in trying to locate the source of the error and google-ing hasn't yielded any fruitful results thus far.

Any guidance or suggestions would be much appreciated.

Many thanks!

Yours Sincerely, Phillip Zhang.

caravagn commented 4 years ago

Hey Phillip thanks for reporting this; it must be a bug on our side, stemming from something wrong in the plot_clusters function. I am happy to look into that and fix it, but I need a bit more info to reproduce it. Can you share your my_cohort_jackknifed object so I can re-run the same call and generate the crash? If that is sensitive, any other object with which you can generate the same error it would work just fine.

pdpz1 commented 4 years ago

Hi Dr Caravagn,

thank you for the swift response! I'm a little unfamiliar with exporting objects in R, but github doesn't seem to support the output file produced by R. Is there perhaps an e-mail I could send the file to?

If it's of any help, I ran the same process through the sample dataset 'TRACERx_NEJM_2017' and it had a similar error:

plot_clusters(sample_cohort) New names:

  • NF1 -> NF1...3
  • NF1 -> NF1...4 Error: Can't subset columns that don't exist. x Column NF1 doesn't exist. Run rlang::last_error() to see where the error occurred. rlang::last_error() <error/vctrs_error_subscript_oob> Can't subset columns that don't exist. x Column NF1 doesn't exist. Backtrace:
    1. revolver::plot_clusters(sample_cohort)
    2. vctrs:::stop_subscript_oob(...)
    3. vctrs:::stop_subscript(...) Run rlang::last_trace() to see the full context. rlang::last_trace() <error/vctrs_error_subscript_oob> Can't subset columns that don't exist. x Column NF1 doesn't exist. Backtrace: x
    4. +-revolver::plot_clusters(sample_cohort)
    5. | -revolver::get_features(x)
    6. | +-%>%(...)
    7. | | -base::eval(lhs, parent, parent)
    8. | | -base::eval(lhs, parent, parent)
    9. | -revolver:::complement(Matrix_drivers, Matrix_clonal_drivers)
    10. | +-N[, colnames(M)]
    11. | -tibble:::[.tbl_df(N, , colnames(M))
    12. | -tibble:::tbl_subset_col(x, j = j, j_arg)
    13. | -tibble:::vectbl_as_col_index(j, x, j_arg = j_arg)
    14. | -tibble:::vectbl_as_col_location(...)
    15. | +-tibble:::subclass_col_index_errors(...)
    16. | | +-base::tryCatch(...)
    17. | | | -base:::tryCatchList(expr, classes, parentenv, handlers)
    18. | | | -base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
    19. | | | -base:::doTryCatch(return(expr), name, parentenv, handler)
    20. | | -base::force(expr)
    21. | -vctrs::vec_as_location(j, n, names, arg = as_label(j_arg))
    22. -vctrs:::stop_subscript_oob(...)
    23. -vctrs:::stop_subscript(...)

Many thanks, Phillip.

caravagn commented 4 years ago

Sure Phillip write me an email and attach the R object that I can run myself to reproduce the error. I suspect there might be a version issue with some of the underlying packages; I think this because you said that the error happens with my default vignettes (it did not happen when I first created them, so something might have changed and need to be updated). You can get my personal email from the package website (I will be able to look into this in 5/6 days when I get back to work - on holiday now). Best, Giulio

caravagn commented 4 years ago

To save variable x do save(x, file = 'abc.RData') and send out abc.RData. I would also need you to post here, or send to me, the output of your sessionInfo().

qingjian1991 commented 3 years ago

Hi, I meet the same error of function "plot_clusters". When running the example codes of plot_clusters, It got errors too.

# Data released in the 'evoverse.datasets
data('TRACERx_NEJM_2017_REVOLVER', package = 'evoverse.datasets')
features = get_features(TRACERx_NEJM_2017_REVOLVER)
New names:
* APC -> APC...46
* BRAF -> BRAF...47
* CDKN2A -> CDKN2A...48
* CHEK2 -> CHEK2...49
* CMTR2 -> CMTR2...50
* ...
Error: Can't subset columns that don't exist.
x Columns `APC`, `BRAF`, `CDKN2A`, `CHEK2`, `CMTR2`, etc. don't exist.

When go into the function of get_features, the error is in the following lines.

get_features= function (x, patients = x$patients)
.....

Matrix_clonal_drivers = complement(Matrix_drivers, Matrix_clonal_drivers) %>%
        replace(is.na(.), 0)
#The line comes an error.
Matrix_subclonal_drivers = complement(Matrix_drivers, Matrix_subclonal_drivers) %>%
        replace(is.na(.), 0)
New names:
* APC -> APC...46
* BRAF -> BRAF...47
* CDKN2A -> CDKN2A...48
* CHEK2 -> CHEK2...49
* CMTR2 -> CMTR2...50
* ...
Error: Can't subset columns that don't exist.
x Columns `APC`, `BRAF`, `CDKN2A`, `CHEK2`, `CMTR2`, etc. don't exist.

Hopefully this will helps you to debug this error.

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] evoverse.datasets_0.1.0 revolver_0.3.0          mtree_0.01.1            ctree_0.1.2             cowplot_1.1.0          
 [6] ggpubr_0.4.0            reshape2_1.4.4          matrixcalc_1.0-3        entropy_1.2.1           clisymbols_1.2.0       
[11] RColorBrewer_1.1-2      ggrepel_0.9.0           igraph_1.2.6            ggraph_2.0.4            tidygraph_1.2.0        
[16] forcats_0.5.0           stringr_1.4.0           dplyr_1.0.2             purrr_0.3.4             readr_1.4.0            
[21] tidyr_1.1.2             tibble_3.0.4            ggplot2_3.3.2           tidyverse_1.3.0         easypar_0.2.0          
[26] crayon_1.3.4            pio_0.1.0              

loaded via a namespace (and not attached):
 [1] colorspace_2.0-0      ggsignif_0.6.0        ellipsis_0.3.1        rio_0.5.16            dynamicTreeCut_1.63-1
 [6] ggdendro_0.1.22       fs_1.5.0              rstudioapi_0.13       farver_2.0.3          graphlayouts_0.7.1   
[11] fansi_0.4.1           lubridate_1.7.9.2     xml2_1.3.2            codetools_0.2-16      splines_4.0.2        
[16] doParallel_1.0.16     knitr_1.30            polyclip_1.10-0       jsonlite_1.7.1        broom_0.7.2          
[21] cluster_2.1.0         dbplyr_2.0.0          ggforce_0.3.2         compiler_4.0.2        httr_1.4.2           
[26] backports_1.2.0       assertthat_0.2.1      Matrix_1.2-18         cli_2.2.0             tweenr_1.0.1         
[31] htmltools_0.5.0       tools_4.0.2           gtable_0.3.0          glue_1.4.2            Rcpp_1.0.5           
[36] carData_3.0-4         cellranger_1.1.0      vctrs_0.3.5           nlme_3.1-150          iterators_1.0.13     
[41] xfun_0.19             openxlsx_4.2.3        rvest_0.3.6           lifecycle_0.2.0       rstatix_0.6.0        
[46] dendextend_1.14.0     MASS_7.3-53           scales_1.1.1          hms_0.5.3             parallel_4.0.2       
[51] yaml_2.2.1            curl_4.3              gridExtra_2.3         stringi_1.5.3         foreach_1.5.1        
[56] permute_0.9-5         zip_2.1.1             rlang_0.4.9           pkgconfig_2.0.3       evaluate_0.14        
[61] lattice_0.20-41       labeling_0.4.2        tidyselect_1.1.0      plyr_1.8.6            magrittr_2.0.1       
[66] R6_2.5.0              generics_0.1.0        DBI_1.1.0             pillar_1.4.7          haven_2.3.1          
[71] foreign_0.8-80        withr_2.3.0           mgcv_1.8-33           abind_1.4-5           modelr_0.1.8         
[76] car_3.0-10            utf8_1.1.4            rmarkdown_2.5         viridis_0.5.1         grid_4.0.2           
[81] readxl_1.3.1          isoband_0.2.3         data.table_1.13.2     vegan_2.5-7           reprex_0.3.0         
[86] digest_0.6.27         munsell_0.5.0         viridisLite_0.3.0 

Yours Sincerely, Qingjian Chen.

caravagn commented 3 years ago

Hi, this is a problem related to some tidyverse package in R 4.0 that changed its way of handling names in tibbles. I wrote revolver with R 3.6, so at the time this bug did not happen.

This is what I replied to @pdpz1 by mail.


I am pretty sure (and hope) that the error is caused by the upgraded version that you have for package “vctrs” (0.3.1 vs 0.3.0 that I have); the new version starts throwing some errors on some subsetting operations that used to be allowed, but now they are no longer. Of course the upgrade of the package causes major disruption for all users (like me, also others) and should be fixed.

We have 2 options: first one, downgraded your version to 0.3.0 and try to re-run, or try to run line-by-line the plot_clusters function and tell me where the error is thrown (it might be inside the get_features() function, in that case it would be ideal for me to know what line causes the error; you should be able to get this information from the debugger).


I will check this out on my R. 4.0 data and try to fix this as soon as possible.

qingjian1991 commented 3 years ago

Hi,

I debug this error. There is minor change in complement() function within get_features() function. When comment a small chunk code in the complement() function, this bug will be fixed.

Best,

Qingjian Chen


get_features = function(x, patients = x$patients){

...

complement = function(M, N)
  {
    # missing patients and driver genes
    miss_Pat = setdiff(M$patientID, N$patientID)
    miss_drv = setdiff(colnames(M), colnames(N))

    # Add template 0-ed matrices with the right rows/ columns
    if(length(miss_Pat) > 0)
    {
      empty = M %>% filter(patientID %in% !!miss_Pat)
      empty[, 2:ncol(empty)] = 0

      N = bind_rows(N, empty)
    }

   #******** Please comment this trunk of code. *********
    #if(length(miss_drv) > 0)
    #  N = bind_cols(N,
    #                M %>%
    #                  select(!!miss_drv) %>%
    #                  replace(TRUE, 0)
    #  )

    N[, colnames(M)]
  }
...

}

Hi, this is a problem related to some tidyverse package in R 4.0 that changed its way of handling names in tibbles. I wrote revolver with R 3.6, so at the time this bug did not happen.

This is what I replied to @pdpz1 by mail.

I am pretty sure (and hope) that the error is caused by the upgraded version that you have for package “vctrs” (0.3.1 vs 0.3.0 that I have); the new version starts throwing some errors on some subsetting operations that used to be allowed, but now they are no longer. Of course the upgrade of the package causes major disruption for all users (like me, also others) and should be fixed.

We have 2 options: first one, downgraded your version to 0.3.0 and try to re-run, or try to run line-by-line the plot_clusters function and tell me where the error is thrown (it might be inside the get_features() function, in that case it would be ideal for me to know what line causes the error; you should be able to get this information from the debugger).

I will check this out on my R. 4.0 data and try to fix this as soon as possible.

qindan2008 commented 3 years ago

I met the same error. I guess the bug may be inside the complement() function from get_features() function. When both length(miss_Pat) >0 and length(miss_drv) > 0.

For my data, original M‘s colnames is: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292 ZSWIM6 original N's colnames is: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 after run: if (length(miss_Pat) > 0) { empty = M %>% filter(patientID %in% !!miss_Pat) empty[, 2:ncol(empty)] = 0 N = bind_rows(N, empty) } N's colnames are the same with M's: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292 ZSWIM6 However, after run: N = bind_cols(N, M %>% select(!!miss_drv) %>% replace(TRUE, 0))" N's colnames turn to be: patientID ADAM23 AGAP3 AIRE DST FH IQGAP1 KIAA1109 KLHDC7B MYH11 POGZ SYNE1 ZNF292...13 ZSWIM6...14 ZNF292...15 ZSWIM6...16 So, when run the flowing code: N[, colnames(M)] , error occured: Error: Can't subset columns that don't exist. ✖ Columns ZNF292 and ZSWIM6 don't exist.

caravagn commented 3 years ago

@qingjian1991 I am happy to comment out a piece of code supposed to do something - which I designed - to solve a bug that I can reproduce myself. Please share with me a small dataset that generates the error and I will find the best way of fixing the bug.

TnakaNY commented 3 years ago

HI, I met the same error by using example data (CRC, TRACER). Here is my sessioninfo. This might help you debug.

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] evoverse.datasets_0.1.0 revolver_0.3.0 mtree_0.01.1 ctree_0.1.2 cowplot_1.1.1
[6] ggpubr_0.4.0 reshape2_1.4.4 matrixcalc_1.0-3 entropy_1.2.1 clisymbols_1.2.0
[11] RColorBrewer_1.1-2 ggrepel_0.9.1 igraph_1.2.6 ggraph_2.0.4 tidygraph_1.2.0
[16] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.3 purrr_0.3.4 readr_1.4.0
[21] tidyr_1.1.2 tibble_3.0.6 ggplot2_3.3.3 tidyverse_1.3.0 easypar_0.2.0
[26] crayon_1.4.0 pio_0.1.0

loaded via a namespace (and not attached): [1] colorspace_2.0-0 ggsignif_0.6.0 ellipsis_0.3.1 rio_0.5.16 dynamicTreeCut_1.63-1 [6] rprojroot_2.0.2 ggdendro_0.1.22 fs_1.5.0 rstudioapi_0.13 farver_2.0.3
[11] remotes_2.2.0.9000 graphlayouts_0.7.1 fansi_0.4.2 lubridate_1.7.9.2 xml2_1.3.2
[16] splines_4.0.2 codetools_0.2-18 doParallel_1.0.16 knitr_1.31 polyclip_1.10-0
[21] pkgload_1.1.0 jsonlite_1.7.2 broom_0.7.4 cluster_2.1.0 dbplyr_2.0.0
[26] ggforce_0.3.2 compiler_4.0.2 httr_1.4.2 backports_1.2.1 Matrix_1.2-18
[31] assertthat_0.2.1 cli_2.3.0 tweenr_1.0.1 htmltools_0.5.1.1 prettyunits_1.1.1
[36] tools_4.0.2 gtable_0.3.0 glue_1.4.2 tinytex_0.29 Rcpp_1.0.6
[41] carData_3.0-4 cellranger_1.1.0 vctrs_0.3.6 nlme_3.1-151 iterators_1.0.13
[46] xfun_0.19 ps_1.5.0 openxlsx_4.2.3 testthat_3.0.1 rvest_0.3.6
[51] lifecycle_0.2.0 devtools_2.3.2 rstatix_0.6.0 dendextend_1.14.0 MASS_7.3-53
[56] scales_1.1.1 hms_1.0.0 parallel_4.0.2 yaml_2.2.1 curl_4.3
[61] memoise_1.1.0 gridExtra_2.3 stringi_1.5.3 desc_1.2.0 foreach_1.5.1
[66] permute_0.9-5 pkgbuild_1.2.0 zip_2.1.1 rlang_0.4.10 pkgconfig_2.0.3
[71] lattice_0.20-41 evaluate_0.14 labeling_0.4.2 processx_3.4.5 tidyselect_1.1.0
[76] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0 generics_0.1.0 DBI_1.1.1
[81] mgcv_1.8-33 pillar_1.4.7 haven_2.3.1 foreign_0.8-81 withr_2.4.1
[86] abind_1.4-5 modelr_0.1.8 car_3.0-10 utf8_1.1.4 rmarkdown_2.6
[91] viridis_0.5.1 usethis_2.0.0 isoband_0.2.3 grid_4.0.2 readxl_1.3.1
[96] data.table_1.13.6 vegan_2.5-7 callr_3.5.1 reprex_1.0.0 digest_0.6.27
[101] munsell_0.5.0 viridisLite_0.3.0 sessioninfo_1.1.1

Best, AT

TnakaNY commented 3 years ago

Hi, Downgrading "vctrs" version 0.3.6 to v0.3.0 did not solve my error of plot_cluster().

TnakaNY commented 3 years ago

I tried R3.6.3 with vctrsv0.3.0/0.3.5/0.3.6, but all showed the same error. Could you post your session info in which latest revolver works well. Using this info, I can make same R env.

caravagn commented 3 years ago

I think I fixed this issue replying to issue [#34], I can install and run this code from beginning to end.

Can you run it? My sessionInfo() is the same at the bottom of [#34].

CROSS_CRC_ADENOCARCINOMA_REVOLVER = revolver_cohort(
  evoverse.datasets::CROSS_CRC_ADENOCARCINOMA_NATECOEVO_2018, 
  MIN.CLUSTER.SIZE = 0, 
  annotation = "Colorectal adenocarcinomas (Cross et al, PMID 30177804)")

revolver_check_cohort(CROSS_CRC_ADENOCARCINOMA_REVOLVER)

non_recurrent = Stats_drivers(CROSS_CRC_ADENOCARCINOMA_REVOLVER) %>% 
  filter(N_tot == 1) %>% 
  pull(variantID)

CROSS_CRC_ADENOCARCINOMA_REVOLVER = remove_drivers(CROSS_CRC_ADENOCARCINOMA_REVOLVER, non_recurrent)
CROSS_CRC_ADENOCARCINOMA_REVOLVER = compute_mutation_trees(CROSS_CRC_ADENOCARCINOMA_REVOLVER)

CROSS_CRC_ADENOCARCINOMA_REVOLVER = revolver_fit(
  CROSS_CRC_ADENOCARCINOMA_REVOLVER, 
  parallel = F, 
  n = 3, 
  initial.solution = NA)

CROSS_CRC_ADENOCARCINOMA_REVOLVER = revolver_cluster(
  CROSS_CRC_ADENOCARCINOMA_REVOLVER, 
  split.method = 'cutreeHybrid',
  min.group.size = 3)

plot_clusters(CROSS_CRC_ADENOCARCINOMA_REVOLVER, cutoff_trajectories = 1, cutoff_drivers = 0)
caravagn commented 3 years ago

I am closing this as well as it seems fixed to me.