clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Correct labeling of amino acids has been lost #3

Closed hickst closed 8 years ago

hickst commented 8 years ago

Amino acids are being labeled as Sites instead of simple chemicals. This causes Reach grounding to fail to identify them. All amino acids are being correctly generated into the NER Simple_chemical.tsv.gz file.

MihaiSurdeanu commented 8 years ago

This must come from the Reach Site rules. @myedibleenso? The CRF does not recognize sites.

myedibleenso commented 8 years ago

I'll need a more information. We'll need tests, so the more examples you can give the better. These can be hard to tell apart, I think.

Also, can you tell me which rules are to blame, @hickst?

hickst commented 8 years ago

mention text: tyrosine List(Site)

Rule => site_long
Type => CorefTextBoundMention
------------------------------
Site|List(Site) => tyrosine
grounding: KBResolution(tyrosine, uaz, UAZ00001, )
------------------------------
myedibleenso commented 8 years ago

Often when an amino acid is mentioned, it is used to reference an underspecified site (ex. "tyrosine phosphorylation"). Even though this isn't an exact location, it is providing us with some information about where a reaction is taking place.

MihaiSurdeanu commented 8 years ago

Let's discuss tomorrow. I think the only place where we use amino acids is to recognize sites. So we might do the right thing.

On Tue, Mar 1, 2016 at 2:12 PM, Gustave Hahn-Powell < notifications@github.com> wrote:

Often when an amino acid is mentioned, it is used to reference an underspecified site (ex. "tyrosine phosphorylation"). Even though this isn't an exact location, it is providing us with some information about where a reaction is taking place.

— Reply to this email directly or view it on GitHub https://github.com/clulab/bioresources/issues/3#issuecomment-190907638.

hickst commented 8 years ago

For our purposes, we are interested only in amino acids as sites so this is not a problem.