clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Extend Gene_or_gene_product groundings with HGNC #25

Closed bgyori closed 4 years ago

bgyori commented 4 years ago

Currently, bioresources only draws on UniProt to gather names and synonyms of Gene_or_gene_product entities. However, at least for human genes/proteins, many essential synonyms are available in HGNC that UniProt does not contain. One class of proteins for which this is a critical gap is ion channels such as SCN10A, which is most commonly referred to as Nav1.8 in the literature (other examples include Nav1.5, Nav1.7, etc., all of which NER and grounding fail on due to missing synonyms). These synonyms are available in HGNC, and therefore it would make sense to add that as a source as well. I could implement it in a way that redundancies with synonyms from UniProt are resolved.

MihaiSurdeanu commented 4 years ago

This would be great. Thanks @bgyori !