This project identifies a list of recurrently mutated and clinically-relevant genes for variants that could possibly be missed by automated variant callers due to low sequence coverage and/or low VAF, and should be manually reviewed when supporting sequence alignment data exists.
Our initial target was a short list of pediatric cancer SMGs from the following sources:
In additon, medically actionable genes from the following sources should be considered:
PeCAN (notes): summarized SMGs / hotspots not available or reproducible from provided downloads. However, several published studies from major datasets used by PeCAN are listed on their site:
As a first pass, then, we can collect gene lists from these datasets at the time of publication in lieu of recomputing and calling our own.
After initial evaluation, this leaves us with three sources of genes. Refining the SMGs to those that appear in both landscape studies and then merging CIViC genes with this list gives us an initial list of 40 genes. Of these, 20 are not reported as adult SMGs in the Kandoth et al. evaluation of TCGA adult cancers. (notes)
To aid in reproducibility, all data generated are stored in
the data/
directory. Analyses (notes) are individually
linked in this document, but also findable in the analyses/
directory.