AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Include or establish a 'blacklist' of likely false positive genes? #231

Open jaclyn-taroni opened 4 years ago

jaclyn-taroni commented 4 years ago

Some genes might show up as highly mutated, but just happen to be very large (e.g., TTN) and are unlikely to be informative. Does a list of genes of this nature already exist out there somewhere in the literature?

If so, including that list somewhere could 1) help contributors with less specific domain expertise filter and interpret their results and 2) allow analysts to use the same list for consistency.

jharenza commented 4 years ago

There is this paper and correction describing FLAGS - Table S4 in the supplement may be a good one to use, and MAFTOOLS actually spits out a warning about these when in your dataset to make users aware. The caution to omitting FLAGS is that some of these could be real (Eg: I saw ROS1, proto-oncogene in this list and while I am not sure if it is flagged because a certain site is a FLAG, other mutated sites/expression/fusions could be important). Maybe I would suggest using this list and if any of the genes are found in the oncogene/TSG list from @kgaonkar6's fusion analysis, then we don't remove. I had instead, in the past, selected specific genes of interest to use (also not ideal because of bias and would miss novelty, but was needed for landscape type work to validate the tumor genomics).

jaclyn-taroni commented 4 years ago

Context for this comment is https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/595#issuecomment-594773187.

The FLAGS list can be obtained via this link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5706417/bin/12920_2017_309_MOESM3_ESM.txt

Here is the (abbreviated) list used in maftools: https://github.com/PoisonAlien/maftools/blob/acef26fa99e2619c4051d7792025e7a16c13b43b/R/summarizeMaf.R#L38