Repository of the TRanslational ONCOlogy library, which includes various algorithms (such as CAPRESE and CAPRI) and the Pipeline for Cancer Inference (PICNIC).
GNU General Public License v3.0
28 stars 7 forks source link

Is this possible to get such a plot by TRONCO #123

Closed beginner984 closed 4 years ago

beginner984 commented 4 years ago


I have noticed a very informative plot in google

Screenshot 2020-04-14 at 01 41 03

Given a group of samples and a list of genes for which we have mutation and GISTIC data enriched in 2 pathways like cell cycle and p53 as shown in this figure

Map of functional alterations for a group of patients. Genes (rows) encoding components p53–DNA repair; are affected by selected functional events (percent of samples altered and types of alteration are represented by colored squares) across group of samples in column. Alterations of the pathway are observed stacked green bar plots at bottom

For me getting such a plot by myself almost is impossible

I have maf format and GISTIC data even the boolean matrix of mutated genes per sample by really I don't know to to get such a plot

Can you help me, although a great exception

caravagn commented 4 years ago

Hi, making that plot if you have the right data required is easy using ggplot. We can try to help you out but 1) you need to give us exactly the data you seek to use, 2) and you have to wait because this is not a priority for us, or for TRONCO.

beginner984 commented 4 years ago

Thanks a million to replying me. I have found such a plot very informative; By TRONCO I am able to visualise GISTIC very beautiful But the problem is adding mutations and pathway to a beautiful GISTIC plot like what I am seeking in my main question seem very challenging. That would be very informative, we can see type and percentage of alterations (copy number + mutation) and the contribution of these alterations in defined pathways

I have 15 samples and 19 genes

This is the boolean matrix of mutation for 15 samples and 19 genes

This is my GISTIC data for 15 samples and 19 genes

Among my genes PIK3CA, EGFR, ERBB2, PTEN are in RTK pathway


CDKN2A and CCNE1 are in cell cycle pathway

In advance thank you vey much for any help to obtain such a visualisation

caravagn commented 4 years ago

Hi, with the time available I can pass you this and the rest you will have to figure out yourself. This is similar to a count plot split by pathway, for mutation data (no GISTIC). You can easily modify this script to include your GISTIC data etc.

muts = readRDS('~/Downloads/Boolean_values for mutations.rds')
gist = readRDS('~/Downloads/GISTIC.rds')

pw_rt = data.frame(
  gene = c('PIK3CA', 'EGFR', 'ERBB2', 'PTEN'),
  pw = c('RTK'),
  stringsAsFactors = F

pw_cc = data.frame(
  gene = c('CDKN2A', 'CCNE1'),
  pw = c('Cell Cycle'),
  stringsAsFactors = F

pw = rbind(pw_rt, pw_cc)

pw_col = RColorBrewer::brewer.pal(n = 3, name = 'Set1')[1:2]
names(pw_col) = unique(pw$pw)


lmuts = reshape2::melt(muts) %>% as_tibble() 
colnames(lmuts) = c('sample', 'gene', 'value')

pw_n = pw$pw
names(pw_n) = pw$gene

Np = nrow(muts)

lmuts %>% 
  mutate(gene = paste(gene), PW = pw_n[gene]) %>%
  group_by(gene, PW) %>%
  summarise(N = sum(value == 1)) %>%
  ungroup() %>%
    PW = ifelse(, "None", PW), 
    N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
    ) %>%
  ggplot(aes(x = PW, y = gene, fill = PW)) +
  geom_tile() +
  geom_text(aes(label = N)) +
  theme_light() +
  theme(legend.position = 'bottom') +
  guides(fill = guide_legend('Pathway')) +
  scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
  scale_x_discrete(limits = c(names(pw_col), 'None')) +
    x = "",
    y = 'Gene',
    title = "Occurrence of mutations"
beginner984 commented 4 years ago

Thank you so much

It is amazing

I wanted to highlight oncogenes and tumour suppressors within this plot

For instance TP53 is a tumour suppressor and CCNE1 is an oncogene

So, I have added extra column to lmuts to show whether a gene is oncogene or a tumour suppressor like this

> head(lmuts)
             sample gene value gene_class
1 LP6005690-DNA_H02 TP53     1        TSG
2 LP2000333-DNA_A01 TP53     1        TSG
3 LP6005409-DNA_D03 TP53     1        TSG
4 LP6008141-DNA_H02 TP53     1        TSG
5 LP6008336-DNA_E02 TP53     1        TSG
6 LP6008269-DNA_B06 TP53     1        TSG

And I then used facet_wrap(~gene_class, ncol=1) but I am getting this error

> lmuts %>% 
+     mutate(gene = paste(gene), PW = pw_n[gene]) %>%
+     group_by(gene, PW) %>%
+     summarise(N = sum(value == 1)) %>%
+     ungroup() %>%
+     mutate(
+         PW = ifelse(, "None", PW), 
+         N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
+     ) %>%
+     ggplot(aes(x = PW, y = gene, fill = PW)) +
+     geom_tile() +
+     geom_text(aes(label = N)) +
+     theme_light() +
+     theme(legend.position = 'bottom') +
+     guides(fill = guide_legend('Pathway')) +
+     scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
+     scale_x_discrete(limits = c(names(pw_col), 'None')) +
+     labs(
+         x = "",
+         y = 'Gene',
+         title = "Occurrence of mutations"
+     )+facet_wrap(~gene_class, ncol=1)
Error: At least one layer must contain all faceting variables: `gene_class`.
* Plot is missing `gene_class`
* Layer 1 is missing `gene_class`
* Layer 2 is missing `gene_class`

This is my full lmuts r object

Sorry for disturbing you

caravagn commented 4 years ago

You have to group_by including gene_class, otherwise the attribute is lost: group_by(gene, PW, gene_class).

beginner984 commented 4 years ago

Thank you

I modified as you kindly suggested

lmuts %>% 
  mutate(gene = paste(gene), PW = pw_n[gene]) %>%
  group_by(gene, PW,gene_class) %>%
  summarise(N = sum(value == 1)) %>%
  ungroup() %>%
    PW = ifelse(, "None", PW), 
    N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
  ) %>%
  ggplot(aes(x = PW, y = gene, fill = PW)) +
  geom_tile() +
  geom_text(aes(label = N)) +
  theme_light() +
  theme(legend.position = 'bottom') +
  guides(fill = guide_legend('Pathway')) +
  scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
  scale_x_discrete(limits = c(names(pw_col), 'None')) +
    x = "",
    y = 'Gene',
    title = "Occurrence of mutations"

This lmuts lmuts.txt

But nothing happenening

caravagn commented 4 years ago

You lost +facet_wrap(~gene_class, ncol=1) ?