Closed beginner984 closed 4 years ago
Hi, making that plot if you have the right data required is easy using ggplot
. We can try to help you out but 1) you need to give us exactly the data you seek to use, 2) and you have to wait because this is not a priority for us, or for TRONCO.
Thanks a million to replying me. I have found such a plot very informative; By TRONCO I am able to visualise GISTIC very beautiful But the problem is adding mutations and pathway to a beautiful GISTIC plot like what I am seeking in my main question seem very challenging. That would be very informative, we can see type and percentage of alterations (copy number + mutation) and the contribution of these alterations in defined pathways
I have 15 samples and 19 genes
This is the boolean matrix of mutation for 15 samples and 19 genes
https://www.dropbox.com/s/6oysoesbdunob3n/Boolean_values%20for%20mutations.rds?dl=0
This is my GISTIC data for 15 samples and 19 genes
https://www.dropbox.com/s/q289603j8ndfpwx/GISTIC.rds?dl=0
Among my genes PIK3CA, EGFR, ERBB2, PTEN are in RTK pathway
And
CDKN2A and CCNE1 are in cell cycle pathway
In advance thank you vey much for any help to obtain such a visualisation
Hi, with the time available I can pass you this and the rest you will have to figure out yourself. This is similar to a count plot split by pathway, for mutation data (no GISTIC). You can easily modify this script to include your GISTIC data etc.
muts = readRDS('~/Downloads/Boolean_values for mutations.rds')
gist = readRDS('~/Downloads/GISTIC.rds')
pw_rt = data.frame(
gene = c('PIK3CA', 'EGFR', 'ERBB2', 'PTEN'),
pw = c('RTK'),
stringsAsFactors = F
)
pw_cc = data.frame(
gene = c('CDKN2A', 'CCNE1'),
pw = c('Cell Cycle'),
stringsAsFactors = F
)
pw = rbind(pw_rt, pw_cc)
pw_col = RColorBrewer::brewer.pal(n = 3, name = 'Set1')[1:2]
names(pw_col) = unique(pw$pw)
require(dplyr)
lmuts = reshape2::melt(muts) %>% as_tibble()
colnames(lmuts) = c('sample', 'gene', 'value')
pw_n = pw$pw
names(pw_n) = pw$gene
Np = nrow(muts)
lmuts %>%
mutate(gene = paste(gene), PW = pw_n[gene]) %>%
group_by(gene, PW) %>%
summarise(N = sum(value == 1)) %>%
ungroup() %>%
mutate(
PW = ifelse(is.na(PW), "None", PW),
N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
) %>%
ggplot(aes(x = PW, y = gene, fill = PW)) +
geom_tile() +
geom_text(aes(label = N)) +
theme_light() +
theme(legend.position = 'bottom') +
guides(fill = guide_legend('Pathway')) +
scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
scale_x_discrete(limits = c(names(pw_col), 'None')) +
labs(
x = "",
y = 'Gene',
title = "Occurrence of mutations"
)
Thank you so much
It is amazing
I wanted to highlight oncogenes and tumour suppressors within this plot
For instance TP53 is a tumour suppressor and CCNE1 is an oncogene
So, I have added extra column to lmuts to show whether a gene is oncogene or a tumour suppressor like this
> head(lmuts)
sample gene value gene_class
1 LP6005690-DNA_H02 TP53 1 TSG
2 LP2000333-DNA_A01 TP53 1 TSG
3 LP6005409-DNA_D03 TP53 1 TSG
4 LP6008141-DNA_H02 TP53 1 TSG
5 LP6008336-DNA_E02 TP53 1 TSG
6 LP6008269-DNA_B06 TP53 1 TSG
>
And I then used facet_wrap(~gene_class, ncol=1)
but I am getting this error
> lmuts %>%
+ mutate(gene = paste(gene), PW = pw_n[gene]) %>%
+ group_by(gene, PW) %>%
+ summarise(N = sum(value == 1)) %>%
+ ungroup() %>%
+ mutate(
+ PW = ifelse(is.na(PW), "None", PW),
+ N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
+ ) %>%
+ ggplot(aes(x = PW, y = gene, fill = PW)) +
+ geom_tile() +
+ geom_text(aes(label = N)) +
+ theme_light() +
+ theme(legend.position = 'bottom') +
+ guides(fill = guide_legend('Pathway')) +
+ scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
+ scale_x_discrete(limits = c(names(pw_col), 'None')) +
+ labs(
+ x = "",
+ y = 'Gene',
+ title = "Occurrence of mutations"
+ )+facet_wrap(~gene_class, ncol=1)
Error: At least one layer must contain all faceting variables: `gene_class`.
* Plot is missing `gene_class`
* Layer 1 is missing `gene_class`
* Layer 2 is missing `gene_class`
This is my full lmuts r object
https://www.dropbox.com/s/2crww8auozfbvkf/lmuts.rds?dl=0
Sorry for disturbing you
You have to group_by
including gene_class
, otherwise the attribute is lost: group_by(gene, PW, gene_class)
.
Thank you
I modified as you kindly suggested
lmuts %>%
mutate(gene = paste(gene), PW = pw_n[gene]) %>%
group_by(gene, PW,gene_class) %>%
summarise(N = sum(value == 1)) %>%
ungroup() %>%
mutate(
PW = ifelse(is.na(PW), "None", PW),
N = paste0(N, ' (', round(N/Np * 100, 1), '%)')
) %>%
ggplot(aes(x = PW, y = gene, fill = PW)) +
geom_tile() +
geom_text(aes(label = N)) +
theme_light() +
theme(legend.position = 'bottom') +
guides(fill = guide_legend('Pathway')) +
scale_fill_manual(values = c(pw_col, `None` = 'gainsboro')) +
scale_x_discrete(limits = c(names(pw_col), 'None')) +
labs(
x = "",
y = 'Gene',
title = "Occurrence of mutations"
)
This lmuts lmuts.txt
But nothing happenening
You lost +facet_wrap(~gene_class, ncol=1)
?
Hi
I have noticed a very informative plot in google
Given a group of samples and a list of genes for which we have mutation and GISTIC data enriched in 2 pathways like cell cycle and p53 as shown in this figure
Map of functional alterations for a group of patients. Genes (rows) encoding components p53–DNA repair; are affected by selected functional events (percent of samples altered and types of alteration are represented by colored squares) across group of samples in column. Alterations of the pathway are observed stacked green bar plots at bottom
For me getting such a plot by myself almost is impossible
I have maf format and GISTIC data even the boolean matrix of mutated genes per sample by really I don't know to to get such a plot
Can you help me, although a great exception