NCATS-Tangerine / ncats-ingest

Management of ingestion of sources for NCATS-translator
2 stars 2 forks source link

Pull genes with similar phenotypes to FA #25

Open mellybelly opened 7 years ago

mellybelly commented 7 years ago

We need to pull a series of data to inform which genes to look for variants in.

There is a google doc here for reference, https://docs.google.com/spreadsheets/d/1yX-5sfrC3vrahf4_k7-5rl4Oqzm853ollIMmUo1PTc0/edit#gid=1185309083

This relates to Set-10.

but essentially we need a gene set based upon phenotypic similarity to our primary genes (some may have alternate primary symbols) and their orthologs: FANCA, FANCB, FANCC, FANCE, FANCF, FANCG, FANCL, FANCM, FANCD2, FANCI, UBE2T FANCD1 (BRCA2), FANCJ, FANCN, FANCO, FANCP, FANCQ, FANCR, FANCS, FANCV, FANCU FAAP100, FAAP24, FAAP20, FAAP16 (MHF1), FAAP10 (MHF2)

Essentially, 1) perform phenotype similarity matching to FA to identify human genes based upon semantic similarity to human and models, returning candidate genes within some stringent cutoff 2) Take the set of genes above and their associated phenotype profiles, and compare them against human/models to identify additional new genes

Assigning to @kshefchek but maybe can reassign to @pnrobinson or @drseb as a fun exercise ;-)

@dnahotline will advise as needed