clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Remove amino acid acronyms #36

Closed bgyori closed 3 years ago

bgyori commented 3 years ago

This PR removes all synonyms from the PubChem resource file which represent a pair of amino acids as an acronym (259 of them in total). These two-letter combinations appear very commonly in text but virtually never represent a pair of amino acids, resulting in a lot of incorrect groundings.

MihaiSurdeanu commented 3 years ago

Thanks @bgyori! Can you please the CHANGES file, so we have a log of the modifications? Also, let me know when I should release.

bgyori commented 3 years ago

Thanks, I'm planning to make a couple more small changes and will update the CHANGES file along with those. A release isn't strictly necessary or urgent for the time being since we can build Reach with unreleased versions of bioresources.