caravagnalab / revolver

REVOLVER - Repeated Evolution in Cancer
https://caravagnalab.github.io/revolver/
64 stars 25 forks source link

plot tree #30

Closed zhouyao315 closed 4 years ago

zhouyao315 commented 4 years ago

chort <- revolver_cohort(input_revolver, MIN.CLUSTER.SIZE = 1) ######99个病人,49261个突变,456个driver gene ########构建树 TRACERx = [compute_clone_trees(](url) chort, patients = "CRUK0001", overwrite = TRUE) tree <- Phylo(TRACERx, 'CRUK0001', rank = 1) plot_patient_trees(TRACERx, 'CRUK0001') test.txt

caravagn commented 4 years ago

Hi, could you please copy and paste all the source code of a reproducible run (literally, copy/ paste and run) so that I can just hit see/ fix the bug?

You can use the guidelines here if you do not know how to insert code. As you can imagine, the code above is quite obscure for me to unnderstand.

zhouyao315 commented 4 years ago

chort <- revolver_cohort(input_revolver, MIN.CLUSTER.SIZE = 1) ######puting data TRACERx = compute_clone_trees( chort, patients = "CRUK0001", overwrite = TRUE) tree <- Phylo(TRACERx, 'CRUK0001', rank = 1) plot_patient_trees(TRACERx, 'CRUK0001') #####ploting trees that generate errors test.txt

zhouyao315 commented 4 years ago
plot_patient_trees = function(x, patient, ...)
{
  if(!has_patient_trees(x, patient))
    stop(patient, " does not have the patient trees, aborting.")

  cex = 1

  # =-=-=-=-=-=-=-=-
  # Top panel: top-ranking tree and its information transfer
  # =-=-=-=-=-=-=-=-
  top_tree = Phylo(x, patient, rank = 1)

  first = ggarrange(
    top_tree %>% plot,
    top_tree %>% plot_information_transfer,
    ncol = 2,
    nrow = 1
  )

  # =-=-=-=-=-=-=-=-
  # Second panel: top-10 ranking trees in icon format
  # =-=-=-=-=-=-=-=-

  # Other 10 trees in 1x10 icon format
  ntrees =  min(
    Stats_trees(x, patient)$numTrees,
    10)

  # Strip plot
  strip = lapply(1:ntrees, Phylo, x = x, p = patient)
  strip = lapply(strip, plot_icon)

  scores = plot_patient_trees_scores(x, patient)

  strip = ggarrange(
    plotlist = append(strip, list(scores)),
    nrow = 1, ncol = 11)

  second = annotate_figure(
    strip,
    fig.lab = paste0("Top-",  ntrees, ' trees'))

  # Assembly
  ggarrange(
    first,
    second,
    heights = c(1, .3),
    ncol = 1,
    nrow = 2
  )
}

########
first = ggarrange(
    top_tree %>% plot,
    top_tree %>% plot_information_transfer,
    ncol = 2,
    nrow = 1
  )

generate errors!!!!!!!!

Error in vertex.attributes<-(graph, index = index, value = value) : Invalid attribute value length, must match number of vertices In addition: Warning message: Duplicated aesthetics after name standardisation: na.rm

zhouyao315 commented 4 years ago

I found a small problem: if a patient has two mutations in a gene and the two mutations are different, then Duplicated driver variantIDs in the cohort that should be removed I think we should go according to the mutation Judgment, not genes

caravagn commented 4 years ago

chort <- revolver_cohort(input_revolver, MIN.CLUSTER.SIZE = 1) ######puting data TRACERx = compute_clone_trees( chort, patients = "CRUK0001", overwrite = TRUE) tree <- Phylo(TRACERx, 'CRUK0001', rank = 1) plot_patient_trees(TRACERx, 'CRUK0001') #####ploting trees that generate errors test.txt

There must be something weird with your setup.

Look, I can run it with no issues this code

input_revolver = evoverse.datasets::TRACERx_NEJM_2017

library(revolver)

chort <- revolver_cohort(input_revolver, MIN.CLUSTER.SIZE = 1) ######puting data
TRACERx = compute_clone_trees( chort, patients = "CRUK0001", overwrite = TRUE)

tree <- Phylo(TRACERx, 'CRUK0001', rank = 1)
plot_patient_trees(TRACERx, 'CRUK0001') #####ploting trees that generate errors

See the attached screenshot

image

My session info is this (check against your package versions)

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] revolver_0.3.0     mtree_0.01.1       ctree_0.01.1       cowplot_1.0.0      ggpubr_0.2.4      
 [6] magrittr_1.5       reshape2_1.4.3     matrixcalc_1.0-3   entropy_1.2.1      clisymbols_1.2.0  
[11] RColorBrewer_1.1-2 ggrepel_0.8.1      igraph_1.2.4.2     ggraph_2.0.0       tidygraph_1.1.2   
[16] forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3        purrr_0.3.3        readr_1.3.1       
[21] tidyr_1.0.0        tibble_2.1.3       ggplot2_3.2.1      tidyverse_1.3.0    easypar_0.2.0     
[26] crayon_1.3.4       pio_0.1.0         

loaded via a namespace (and not attached):
 [1] nlme_3.1-139            fs_1.3.1                evoverse.datasets_0.1.0 lubridate_1.7.4        
 [5] doParallel_1.0.15       httr_1.4.1              dynamicTreeCut_1.63-1   tools_3.6.0            
 [9] backports_1.1.5         utf8_1.1.4              R6_2.4.1                DBI_1.1.0              
[13] lazyeval_0.2.2          colorspace_1.4-1        withr_2.1.2             tidyselect_0.2.5       
[17] gridExtra_2.3           compiler_3.6.0          cli_2.0.1               rvest_0.3.5            
[21] xml2_1.2.2              ggdendro_0.1-20         labeling_0.3            scales_1.1.0           
[25] digest_0.6.23           pkgconfig_2.0.3         sessioninfo_1.1.1       dbplyr_1.4.2           
[29] rlang_0.4.2             readxl_1.3.1            rstudioapi_0.10         farver_2.0.1           
[33] generics_0.0.2          jsonlite_1.6            dendextend_1.13.2       Rcpp_1.0.3             
[37] munsell_0.5.0           fansi_0.4.1             viridis_0.5.1           lifecycle_0.1.0        
[41] stringi_1.4.3           MASS_7.3-51.4           plyr_1.8.5              grid_3.6.0             
[45] parallel_3.6.0          lattice_0.20-38         graphlayouts_0.5.0      haven_2.2.0            
[49] hms_0.5.2               zeallot_0.1.0           pillar_1.4.3            ggsignif_0.6.0         
[53] codetools_0.2-16        reprex_0.3.0            glue_1.3.1              packrat_0.5.0          
[57] modelr_0.1.5            vctrs_0.2.1             tweenr_1.0.1            foreach_1.4.7          
[61] cellranger_1.1.0        gtable_0.3.0            polyclip_1.10-0         assertthat_0.2.1       
[65] ggforce_0.3.1           broom_0.5.3             viridisLite_0.3.0       iterators_1.0.12       
[69] cluster_2.0.8          
caravagn commented 4 years ago

I found a small problem: if a patient has two mutations in a gene and the two mutations are different, then Duplicated driver variantIDs in the cohort that should be removed I think we should go according to the mutation Judgment, not genes

This is not an issue and is discussed inside the FAQs. It relates to parallel evolution, you just need to assign different IDs on the mutations of a gene so you can distinguish, e.g., "KRAS G12G" vs "KRAS G12D".

 - Can I model parallel evolution?
If you detect multiple driver mutations in different positions/ nucleotides of a gene GENE, then these might appear in (possibly) different positions of the phylogentic tree for the patient. It is possible to handle these events in REVOLVER, but this requires some considerations.

REVOLVER computes correlated evolutionary trajectories among recurrent drivers, which are indexed using an ID-based identification system. The ID has to be unique, and consistent across all patients where we think the same driver occurs; in many cases this could be just the gene name GENE.

In the above case, however, id-ing all the driver mutations in the gene as GENE renders the id not-unique. This would not work with the current model implemented by REVOLVER, and the tool would raise an error during the computation. You can circumvent this by using a finer-grained resolution in the preparation of these data: for instance, you can add to the id the nucleotide position of the mutation, using something like GENE_XXX and GENE_YYY, where XXX and YYY are positions or protein domains.

Doing so would allow one using multiple driver events onto the same gene. A obvious drawback could be, however, that those drivers might be far less “recurrent” across the cohort, requiring larger datasets to find the repeated trajectories that involve those events.
caravagn commented 4 years ago

By the way, there are functions to remove any mutation that you want from the data that you load.

This removes drivers for instance, you can use it to your own judgment.

In general, the docs are quite exhaustive and cover these issues: have a look at the tool webpage.

caravagn commented 4 years ago

Unless you have more here I will close this issue. If you need more support please open a new issue (for every separate topic etc.). Thanks.

zhouyao315 commented 4 years ago

this my data not evoverse.datasets::TRACERx_NEJM_2017

caravagn commented 4 years ago

There is no way for me to help you unless you either show me your data in some way (plot, print etc etc).

zhouyao315 commented 4 years ago

test.txt

caravagn commented 4 years ago

Hi @zhouyao315. I am sorry but I do not really know what to say.... I have run this code on your data and it works for me. I am attaching a screenshot of my RStudio session to prove it. Did you check if there is any difference in the sessionInfo() that we have? Probably you want to check the versions of the library for graphs that are used by revolver (mainly ggraph, igraph and RGraphviz)?

# Load your file
input = read.table("~/Downloads/test (1).txt", stringsAsFactors = F, sep = '\t', header = T)
str(input)

# Cluster labels need to be strings, so I convert them (the tool would raise an error otherwise)
input$cluster = paste(input$cluster)

require(revolver)

# Load the cohort with your option
x = revolver_cohort(input,  MIN.CLUSTER.SIZE = 1)

# Compute the tree
TRACERx = compute_clone_trees(x, patients = "CRUK0001", overwrite = TRUE)

# Extract and plot the tree -- it works
tree <- Phylo(TRACERx, 'CRUK0001', rank = 1)

plot(tree)

# This also works
plot_patient_trees(TRACERx, 'CRUK0001') 

image