YuLab-SMU / enrichplot

Visualization of Functional Enrichment Result
https://yulab-smu.top/biomedical-knowledge-mining-book/
229 stars 65 forks source link

Is it possible to change shape of cnetplot plot points? #141

Open hlnicholls opened 3 years ago

hlnicholls commented 3 years ago

I have a cnetplot and I am wondering if it is possible for me to do further categorising of genes in the plot by point shape?

My cnetplot is of genes and their interacting pathways, but the genes are also in a separate dataset that lists each of their drug categories, and there are 4-5 categories in total. I want to see if I could change the point shape from circles to have shapes corresponding to the drug category a gene might be in, as well as already having/keeping the points colour-coded by the fold change in cnetplot().

Currently I am trying:

kegg_organism = "hsa"
kegg_enrich <-  enrichKEGG(gene   = df$geneID,  #entrez IDs of the genes
                   organism     = 'hsa',
                   pvalueCutoff = 0.05,
                   pAdjustMethod = 'fdr')

kegg <- setReadable(kegg_enrich, 'org.Hs.eg.db', 'ENTREZID')
kegg_genes <- kegg[,]

gene_of_interest <- dplyr::filter(kegg_genes, grepl('BRCA1', geneID))
gene_of_interest <- enrichDF2enrichResult(gene_of_interest)

plot <- cnetplot(gene_of_interest, foldChange = gene_list_measure)

plot <- plot + scale_color_gradient2(name='Score', low='steelblue', high='firebrick')

drugs <- fread('genes_dgidb_export.tsv')
drugs <- dplyr::select(drugs, Gene, Druggability)
gene_drugs <- drugs$Druggability
names(gene_drugs) <- drugs$Gene

plot + geom_point(aes(shape=gene_drugs))

Error: Aesthetics must be either length 1 or the same as the data (56): shape and colour

My drugs gene list is shorter than the gene list in genes_of_interest, but my drugs data does contain only genes that are in genes_of_interest object and their corresponding drug categories - is there a way for me to add these into the cnetplot to consider as point shapes?

I'm not an experienced coder, but would it this be possible to do if I can add my drugs data as a column in my gene_of_interest enrichResult data somehow and fill in any missing drug category values with NA for the genes, making the drug data have the same length of 56 that the error message above mentions?

For reference, an example of my input drug data that I am trying to overlay as shape categories for the genes looks like:

drugs <- structure(list(Gene = c("TLN2", "PDGFC", "PIK3R3", "PIP5K1B", 
"VEGFA"), Druggability = c("KINASE", "DRUGGABLE GENOME", "CLINICALLY ACTIONABLE", 
"KINASE", "CLINICALLY ACTIONABLE")), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

#drugs data is 2 columns like:
Gene        Druggability
TLN2        KINASE
PDGFC       DRUGGABLE GENOME 
...

And gene_of_interest is a formal class of enrichResult so I'm not sure how to share a sample of it. And gene_list_measure is just another dataset of 2 columns of gene symbols and their foldchange scores. An example of this would be:

gene_list <- structure(list(Gene = c("RSF1", "SNTG1", "FOLH1", "CHST6", "SMARCC1"
), Score = c(0.810057997703552, 0.809059321880341, 0.771913826465607, 
0.778315424919128, 0.806403398513794)), row.names = c(NA, -5L
))

gene_list_scores <- gene_list$V2
names(gene_list_scores) <- gene_list $V1
gene_list_scores <- na.omit(gene_list_scores)
gene_list_scores  <- sort(gene_list_scores, decreasing = TRUE)

Currently the output of cnetplot looks like: testplot

My SessionInfo:

sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.12.0       AnnotationDbi_1.52.0      IRanges_2.24.1            S4Vectors_0.28.1         
 [5] Biobase_2.50.0            BiocGenerics_0.36.1       RColorBrewer_1.1-2        multienrichjam_0.0.48.900
 [9] DOSE_3.16.0               forcats_0.5.1             stringr_1.4.0             dplyr_1.0.7              
[13] purrr_0.3.4               readr_2.0.0               tidyr_1.1.3               tibble_3.1.2             
[17] ggplot2_3.3.5             tidyverse_1.3.1           ComplexHeatmap_2.6.2      enrichplot_1.10.2        
[21] clusterProfiler_3.18.1    UpSetR_1.4.0              pathview_1.30.1           data.table_1.14.0        

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.5    fgsea_1.16.0        colorspace_2.0-2    rjson_0.2.20        ellipsis_0.3.2      circlize_0.4.13    
  [7] qvalue_2.22.0       snakecase_0.11.0    XVector_0.30.0      fs_1.5.0            GlobalOptions_0.1.2 clue_0.3-59        
 [13] rstudioapi_0.13     farver_2.1.0        graphlayouts_0.7.1  ggrepel_0.9.1       bit64_4.0.5         fansi_0.5.0        
 [19] scatterpie_0.1.6    lubridate_1.7.10    xml2_1.3.2          splines_4.0.5       cachem_1.0.5        GOSemSim_2.16.1    
 [25] polyclip_1.10-0     jsonlite_1.7.2      Cairo_1.5-12.2      broom_0.7.8         dbplyr_2.1.1        cluster_2.1.2      
 [31] GO.db_3.12.1        png_0.1-7           graph_1.68.0        ggforce_0.3.3       BiocManager_1.30.16 compiler_4.0.5     
 [37] httr_1.4.2          rvcheck_0.1.8       backports_1.2.1     assertthat_0.2.1    Matrix_1.3-4        fastmap_1.1.0      
 [43] cli_3.0.1           tweenr_1.0.2        tools_4.0.5         igraph_1.2.6        gtable_0.3.0        glue_1.4.2         
 [49] reshape2_1.4.4      DO.db_2.9           tinytex_0.32        fastmatch_1.1-0     Rcpp_1.0.7          jamba_0.0.64.900   
 [55] cellranger_1.1.0    vctrs_0.3.8         Biostrings_2.58.0   ggraph_2.0.5        xfun_0.24           rvest_1.0.0        
 [61] lifecycle_1.0.0     XML_3.99-0.6        zlibbioc_1.36.0     MASS_7.3-54         scales_1.1.1        tidygraph_1.2.0    
 [67] hms_1.1.0           KEGGgraph_1.50.0    memoise_2.0.0       gridExtra_2.3       downloader_0.4      stringi_1.7.3      
 [73] RSQLite_2.2.7       BiocParallel_1.24.1 shape_1.4.6         rlang_0.4.11        pkgconfig_2.0.3     bitops_1.0-7       
 [79] matrixStats_0.59.0  lattice_0.20-44     labeling_0.4.2      cowplot_1.1.1       shadowtext_0.0.8    bit_4.0.4          
 [85] tidyselect_1.1.1    plyr_1.8.6          magrittr_2.0.1      R6_2.5.0            generics_0.1.0      DBI_1.1.1          
 [91] withr_2.4.2         haven_2.4.1         pillar_1.6.1        KEGGREST_1.30.1     RCurl_1.98-1.3      modelr_0.1.8       
 [97] janitor_2.1.0       crayon_1.4.1        utf8_1.2.1          tzdb_0.1.2          viridis_0.6.1       GetoptLong_1.0.5   
[103] readxl_1.3.1        blob_1.2.1          Rgraphviz_2.34.0    reprex_2.0.0        digest_0.6.27       munsell_0.5.0      
[109] viridisLite_0.4.0  
> 

Sorry this is a lot of information, any kind of help on if it's possible to change the point shapes would be helpful

hlnicholls commented 3 years ago

I managed to address my need for shapes in a way, but with using an additional package to overlay the shapes, and this still isn't quite correct, it looks like this:

drugs <- fread('genes_dgidb_export.tsv')
drugs <- dplyr::select(drugs, Gene, Druggability)
drugs <- drugs[1:56,] #making data same size as mentioned in previous error message
Druggability <- drugs$Druggability
names(Druggability) <- drugs$Gene

options(ggrepel.max.overlaps = Inf)
pother <- cnetplot(gene_of_interest,
                   categorySize ='pvalue', 
                   foldChange = gene_list_scores, 
                  )

pother <- pother + scale_color_gradient2(name='Score', low='steelblue', high='red') +
  scale_size_continuous(range = c(2, 8)) 

#Overlaying shapes by drug:
library(ggraph)

pother + geom_node_point(aes(shape=Druggability)) +
  scale_shape_manual(values=c(2, 5, 3, 4))  

This gives:

testplot2

This isn't exactly what I'm looking for (ideally I'd like to change the point shapes completely, maintaining the color scale, and not need to overlay a new point shape on top), but the bigger problem I can't solve is that some of the pathway nodes are also being assigned a druggability shape, even though only each gene has an assigned druggability, I'm not sure how to fix this. And the shapes that are assigned in the plot are not in the correct places, some genes are being assigned a shape when I know from my Druggability data that they shouldn't be.