cBioPortal / icebox

very low priority issues
0 stars 0 forks source link

Allow importing all mutations (synonymous, intron,...) #3

Open j-hudecek opened 8 years ago

j-hudecek commented 8 years ago

Currently cBioPortal importer in MutationFilter.java ignores lots of mutations which could be interesting to the researchers. I've tried disabling it (i.e. allowing all mutations from the input files, hardcoding "return true" as the answer of the mutation filter) and haven't found any adverse effects apart from worsening performance with 100k+ mutations. It would be useful to have this as an option. Or alternatively import all mutations (i.e.including synonymous) as a different data type

Andreea-Bican commented 8 years ago

Hello! I'm willing to help on this one. Can you explain me more?

j-hudecek commented 8 years ago

Think of it like this: when you sequence a bunch of DNA you get one of three possibilities

  1. You read a letter that is different than the reference - there is a mutation
  2. You read a letter that is the same as the reference
  3. It doesn't work out and you still don't know what letter is there

Currently cBioportal only stores information about 1. but information about 2. could also be interesting. Moreover cBioportal looks at the position of the mutation with respect to the genes (it only stores mutations in protein coding part of the gene) and its effect on the protein that is created when the cell machinery transcribes this mutated gene into a protein (it only stores mutation which actually do have an effect on a gene). So cBioPortal only stores mutations that are sure to cause a difference in the protein. However, the other mutations are also interesting - nowadays people are looking more and more into the regulation of the genes and they believe it is driven by DNA surrounding the protein-coding parts of the gene. Mutations that occur in protein coding parts of the gene but don't affect the protein composition can affect mRNA structure or interactions. In conclusion it would be useful to store information about a) lack of mutations b) mutations with less severe impact in cBioPortal. Currently when there is a mutation in a gene it is marked as mutated in OncoPrint. Obviously if we stored all the mutations we would have to distinguish between mutations that we are sure have an effect on the protein and ones which might (there would be a lot more of them, perhaps they would be in all the genes - drawing them in OncoPrint would be quite useless). Solution could be to change the MutationFilter class to classify mutations instead of filtering, store this classification with the mutations in mutations table and only consider a gene mutated if it has a mutation with a more severe classification. Hope it's clearer!