This PR adds a new script to download HGNC human gene names and synonyms. The script is implemented such that (1) only genes that have a single, unique corresponding UniProt ID are considered and (2) only synonyms that are not already provided by UniProt are considered. With these constraints, the new hgnc.tsv file uses UniProt IDs and adopts the same format as uniprot-proteins.tsv. In other words, this is a fairly clean patch to extend Gene_or_gene_product synonyms beyond UniProt.
I will next test these changes to see if any Reach tests are broken.
This PR adds a new script to download HGNC human gene names and synonyms. The script is implemented such that (1) only genes that have a single, unique corresponding UniProt ID are considered and (2) only synonyms that are not already provided by UniProt are considered. With these constraints, the new hgnc.tsv file uses UniProt IDs and adopts the same format as uniprot-proteins.tsv. In other words, this is a fairly clean patch to extend Gene_or_gene_product synonyms beyond UniProt.
I will next test these changes to see if any Reach tests are broken.
Fixes #25 .