Open malachig opened 3 years ago
I also think this data would be super useful, so I looked into it a bit. Just recording what I found...
Most of our NCBI data comes from https://ftp.ncbi.nlm.nih.gov/gene/DATA/. It looks like those phenotypes come from mim2gene_medgen
. If I search for @malachig's example gene ID 673
, I get the following records:
$ awk '$2==673' mim2gene_medgen
115150 673 phenotype GeneMap CN029449 -
163950 673 phenotype GeneReviews C4551602 -
164757 673 gene - - -
211980 673 phenotype GeneMap C0684249 -
613706 673 phenotype GeneMap C3150970 -
613707 673 phenotype GeneMap C3150971 -
It looks like it got five out of the seven phenotypes listed on https://www.ncbi.nlm.nih.gov/gene/673
![]() |
---|
The two MedGen IDs that aren't found for gene 673
are not found anywhere in the mim2gene_medgen
file.
$ grep -c CN239586 mim2gene_medgen.txt
0
$ grep -c CN239577 mim2gene_medgen.txt
0
Hmm, not sure what the source is for those two missing ones...
Also checked medgen download, doesn't see such a file.
Unless someone could provide a link to the full file, we probably will just go with mim2gene_medgen.txt.
And FYI, we do have a BioThings API ready which can connect from gene -> disease/phenotype, that's the EBIGene2Phenotype API. For example, you can query by HGNC ID to get associated conditions: https://biothings.ncats.io/ebigene2phenotype/gene/1097
It would be great to be able to pull
associated condition
information from Entrez via mygene.info.For example, for BRAF (https://www.ncbi.nlm.nih.gov/gene/673):
Under phenotypes they list conditions from the genetic testing registry such as: Cardiofaciocutaneous syndrome 1 Dabrafenib response ... Vemurafenib response
We would really like to pull such information into CIViC along with other critical gene info we already obtain from myvariant.info (e.g. https://civicdb.org/events/genes/5/summary/variants/2826/summary)