Open karissawhiting opened 1 year ago
The method I use to make matrices for Oncoprints works directly from an aggregated alteration file (combining maf, cna and fusion files). Can share code if of interest.
@edrill Yes please share code if you can. Thank you
mut2 <- gnomeR::mutations %>%
group_by(sampleId,hugoGeneSymbol) %>%
filter(row_number()==1) %>%
ungroup() %>%
tidyr::pivot_wider( id_cols = "sampleId", names_from = "hugoGeneSymbol",
values_from = "mutationType", values_fill = "None")
This is the basis of the code it is very similar to create_gene_binary and possible to use internal functions to create this text matrix. @karissawhiting what are your thoughts ?
@michaelcurry1123 I wonder what happens when (if) you have two types on mutations on the same gene...
For the numeric binary matrix it wouldn't matter but I could see that being a problem here. Maybe we create a vector for that cell and throw a warning?
can they have two of the same type of mutations (eg. two fusions on the same gene) or would it be a mutation, fusion or cna? @karissawhiting
You could have two types of mutations on the same gene and we'd want to represent both in an oncoprint. If there are two of the same type of mutation (rare I think?) we would just count 1 (like presence/absence)
ok no problem did that!
Chiming in with my 2 cents --> What about the case where there is a copy number or fusion and a mutation? We definitely would want to show that on the oncoprint. Or if there are 2 mutations that aren't the same type, e.g. missense and frameshift - it is not possible to show both - how do you decide what to show? I usually look for those instances and replace with "Multiple muts."
I use this general code to keep all mutations/alterations in: dplyr::summarise( type_pre = paste(sort(alteration), collapse = ";"), )
@edrill Thanks for the input! Yes, we are definitely going to have separate columns for mutation/fus/CNA so that all can be shown on the oncoprint. We were going back and forth on how to display multiple mutation types in 1 sampled on the same gene. I like the idea of having a "multiple mutations" annotation in the matrix and throwing a warning if this comes up in someone's data telling them they have to filter the data beforehand themselves if they want one over the other mutation type displayed.
esther- We also want to create a data check/dictionary of possible values the mutation type column can accept (e.g.. missense, truncating etc). Do you have a list of this anywhere ?
Thanks!
OK - I was assuming you were starting out with oncoprint() function code from complexheatmap package which requires e.g. "DELETION; MISSENSE;" in the same cell to show both on oncoprint.
In terms of mutation type values, I don't have a comprehensive list. But I just looked back at my two projects with largest sample sizes and these were the categories included:
3'Flank 3'UTR 5'Flank Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins Intron Missense_Mutation Nonsense_Mutation Nonstop_Mutation nonsynonymous_SNV Silent Splice_Region Splice_Site Translation_Start_Site
mut2 <- gnomeR::mutations %>%
group_by(sampleId,hugoGeneSymbol,mutationType) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, hugoGeneSymbol) %>%
summarise(alteration = paste(mutationType, collapse = ",")) %>%
ungroup() %>%
mutate(alteration = ifelse(grepl(",", alteration), "Multiple Mutations",alteration ))
cna2 <- gnomeR::cna %>%
group_by(sampleId,hugoGeneSymbol,alteration) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, hugoGeneSymbol) %>%
summarise(alteration = paste(alteration, collapse = ","))
fus2 <- gnomeR::sv %>%
group_by(sampleId,site1HugoSymbol,variantClass) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, site1HugoSymbol) %>%
summarise(alteration = paste(variantClass, collapse = ",")) %>%
rename(hugoGeneSymbol =site1HugoSymbol)
allgene <- bind_rows(cna2, fus2, mut2) %>%
group_by(sampleId, hugoGeneSymbol) %>%
summarise(alteration = paste(alteration, collapse = ",")) %>%
ungroup() %>%
tidyr::pivot_wider( id_cols = c("sampleId"), names_from = "hugoGeneSymbol",
values_from = "alteration", values_fill = NA_character_)
Here is some code I came up with where I handle multiple mutations for mut file and then combine them all together. very very rough draft so if this isn't quiet on the fight track let me know!
I think this code looks good. To recap, we discussed:
create_gene_binary()
). @karissawhiting I think this code gets us some of the way there, it is wide, addresses multiple fusions and cna. will have to look into the other stuff though also have some questions about the .del and .fus endings might be easier to chat through
mut2 <- gnomeR::mutations %>%
group_by(sampleId,hugoGeneSymbol,mutationType) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, hugoGeneSymbol) %>%
summarise(alteration = paste(mutationType, collapse = ",")) %>%
ungroup() %>%
mutate(alteration = ifelse(grepl(",", alteration), "Multiple Mutations",alteration )) %>%
tidyr::pivot_wider( id_cols = c("sampleId"), names_from = "hugoGeneSymbol",
values_from = "alteration", values_fill = NA_character_)
cna2 <- gnomeR::cna %>%
group_by(sampleId,hugoGeneSymbol,alteration) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, hugoGeneSymbol) %>%
summarise(alteration = paste(alteration, collapse = ",")) %>%
ungroup() %>%
mutate(alteration = ifelse(grepl(",", alteration), "Multiple CNAs",alteration )) %>%
tidyr::pivot_wider( id_cols = c("sampleId"), names_from = "hugoGeneSymbol",
names_glue = "{hugoGeneSymbol}.cna",
values_from = "alteration", values_fill = NA_character_)
fus2 <- gnomeR::sv %>%
group_by(sampleId,site1HugoSymbol,variantClass) %>%
filter(row_number()==1) %>%
ungroup() %>%
group_by(sampleId, site1HugoSymbol) %>%
summarise(alteration = paste(variantClass, collapse = ",")) %>%
rename(hugoGeneSymbol =site1HugoSymbol) %>%
ungroup() %>%
mutate(alteration = ifelse(grepl(",", alteration), "Multiple Fusions",alteration )) %>%
tidyr::pivot_wider( id_cols = c("sampleId"), names_from = "hugoGeneSymbol",
names_glue = "{hugoGeneSymbol}.fus",
values_from = "alteration", values_fill = NA_character_)
allgene <- Reduce(function(x,y){full_join(x,y, by ="sampleId")}, list(cna2,fus2,mut2))
@karissawhiting right now the fusion and cna files have .fus and .cna at the end, we were gonna keep as it and change if we needed to added .amp or .del to the suffix
go with long format instead of wide and then we can make another internal function to pivot wide if needed
@edrill suggested this. This can be useful for oncoprints and more in-depth mutation specific analyses.
The relevant information seems to be in the following MAF columns:
@edrill - what type of information from the above do you maintain in your matrix? Also, does this only apply to mutations or fusions/CNA as well?
@michaelcurry1123 - I'm thinking this could be a separate new function that doesn't rely on the other version of the binary matrix, but I'm open to other ideas. Not sure what to call it yet. I think we should add a check of possible levels (e.g. missense, splice, etc...)