Open arq5x opened 11 years ago
Awesome, I had this on my list of things to do. Dataset S2 has the gene names and Residual Variation Intolerance Score Percentile that could be extracted into a BED file after retrieving coordinates. Generalizing the gene to coordinate mapping would be nice, but is a hard one.
Yep, S2 is what we are using currently for another project. Biggest issue is the darn gene naming problem. GENCODE gene names may not always map to their CCDS names (which I think use HGNC). We will need to "explode" Dataset S2 to include gene synonyms.
I was told that you can handle the mapping issue with bridgedb webservice: http://bridgedb.org/wiki/BridgeWebservice to standardize everything to one gene name, hope it makes the gene->coordinate issue easier to resolve
Paul
On Tue, Sep 3, 2013 at 11:53 AM, Aaron Quinlan notifications@github.comwrote:
Yep, S2 is what we are using currently for another project. Biggest issue is the darn gene naming problem. GENCODE gene names may not always map to their CCDS names (which I think use HGNC). We will need to "explode" Dataset S2 to include gene synonyms.
— Reply to this email directly or view it on GitHubhttps://github.com/arq5x/gemini/issues/208#issuecomment-23737503 .
Hi
We think RVIS is pretty good too - it would be a great addition with the PROVEAN score to GEMINI if possible! See the following link: http://provean.jcvi.org/genome_submit.php
thanks
Tony
For example, the RVIS measure from Petrovski et al [1] assesses a gene's intolerance to genetic variation based on the ESP project. I also recall a poster from Mark Daly's lab at Biology of Genomes that introduces a similar statistic. These may be useful in interpreting / prioritizing genes based on observed variants.
[1] http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003709