clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Integrate HGNC names and synonyms in addition to UniProt #26

Closed bgyori closed 4 years ago

bgyori commented 4 years ago

This PR adds a new script to download HGNC human gene names and synonyms. The script is implemented such that (1) only genes that have a single, unique corresponding UniProt ID are considered and (2) only synonyms that are not already provided by UniProt are considered. With these constraints, the new hgnc.tsv file uses UniProt IDs and adopts the same format as uniprot-proteins.tsv. In other words, this is a fairly clean patch to extend Gene_or_gene_product synonyms beyond UniProt.

I will next test these changes to see if any Reach tests are broken.

Fixes #25 .