clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Proonto #41

Closed JakeWolfe closed 3 years ago

JakeWolfe commented 3 years ago

Protein ontology protein fragment entity integration.

MihaiSurdeanu commented 3 years ago

@bgyori: we added protein fragments from the PO here. Does this PR look Ok to you? There is another parallel PR for this in reach. I will mention you there as well. Thanks!

bgyori commented 3 years ago

Thanks @MihaiSurdeanu and @JakeWolfe. I'll try to get to this by tomorrow!

bgyori commented 3 years ago

I made a few changes, to add another category of synonyms we previously missed, and remove some synonyms that are not things that would actually be used in text. I have some concerns still: as opposed to other protein resources, here we don't use the species information so all entries default to human. But actually, SARS-CoV-2 protein fragments, as well as some other viral proteins are now picked up as Human. This could cause issues downstream of Reach, though isn't very serious (since the grounding itself is more important than what Reach says about the organism). I am also generally a bit disappointed with the Protein Ontology - this is also my first time working with it - since it seems to be missing some human protein entries that I would have expected to see (e.g., bradykinin), and also doesn't really seem to have many viral proteins other than SARS-CoV-2. I think we can use these synonyms, but we may want to also include UniProt fragments, which as far as I can now see have better coverage, and are also more "synonym-like" (i.e., found in actual text).

MihaiSurdeanu commented 3 years ago

Thanks @bgyori !