clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Support generating extra organism labels for proteins #52

Closed bgyori closed 3 years ago

bgyori commented 3 years ago

This PR doesn't change any of the resource files but it adds support for adding extra organism labels for protein entries in a user-configurable way. For instance, passing 10239 (the taxonomy ID for "Viruses") to the update_uniprot_proteins.py script adds Viruses as an extra organism label for all viral proteins. This then allows adding Viruses to ner_kb.config to include all viral proteins in NER.

I am not changing the actual resource files because the inclusion of organism-specific synonyms is use-case specific so I don't think the "official" release of bioresources should make any specific additions. But these features are useful for custom local builds.