Closed hickst closed 8 years ago
This must come from the Reach Site rules. @myedibleenso? The CRF does not recognize sites.
I'll need a more information. We'll need tests, so the more examples you can give the better. These can be hard to tell apart, I think.
Also, can you tell me which rules are to blame, @hickst?
Rule => site_long
Type => CorefTextBoundMention
------------------------------
Site|List(Site) => tyrosine
grounding: KBResolution(tyrosine, uaz, UAZ00001, )
------------------------------
Often when an amino acid is mentioned, it is used to reference an underspecified site (ex. "tyrosine phosphorylation"). Even though this isn't an exact location, it is providing us with some information about where a reaction is taking place.
Let's discuss tomorrow. I think the only place where we use amino acids is to recognize sites. So we might do the right thing.
On Tue, Mar 1, 2016 at 2:12 PM, Gustave Hahn-Powell < notifications@github.com> wrote:
Often when an amino acid is mentioned, it is used to reference an underspecified site (ex. "tyrosine phosphorylation"). Even though this isn't an exact location, it is providing us with some information about where a reaction is taking place.
— Reply to this email directly or view it on GitHub https://github.com/clulab/bioresources/issues/3#issuecomment-190907638.
For our purposes, we are interested only in amino acids as sites so this is not a problem.
Amino acids are being labeled as Sites instead of simple chemicals. This causes Reach grounding to fail to identify them. All amino acids are being correctly generated into the NER Simple_chemical.tsv.gz file.