Phosphorylation picking up residue information from from wrong protein

johnbachman commented 6 years ago

Hi guys, I've included a set of 44 sentences where REACH makes the same systematic error each time. REACH is extracting (essentially) that Insulin phosphorylates AKT on tyrosine, when in fact the sentences are referring to a tyrosine residue on another protein (either IR or IRS1) in each case. Here are three examples:

"Furthermore, IGF-1, but not insulin, induced tyrosine phosphorylation of IRS-1 and pSer 473 Akt in platelets lacking the IR."

"When SOCS3 was overexpressed in primary adipocytes or adipose tissue, insulin stimulated IRS-1 tyrosine phosphorylation, PI3 kinase activation of p85 subunit, and Akt phosphorylation decreased resulting in increased insulin resistance [XREF_BIBR, XREF_BIBR]."

"Insulin stimulated rapid tyrosine phosphorylation of IR, phosphorylation of the PI3K effector Akt at Ser473, and tyrosine phosphorylation of IRS-1 and IRS-2, and their association with the regulatory PI3K subunit p85."

I've attached a file containing INDRA Statements in JSON format (Github made me set the extension to .txt) containing all 44 sentences.

reach_ins_akt_phos_tyrosine_errors.txt

MihaiSurdeanu commented 6 years ago

Thanks John! We'll take a look at these soon.

MihaiSurdeanu commented 6 years ago

@enoriega, @maxaalexeeva: something more for you. To be discussed tomorrow.

MihaiSurdeanu commented 6 years ago

@enoriega: please check if this is caused by broken syntax or something else. If it's syntax, I propose to postpone this until we merge UD in master. Please update on this thread.

maxaalexeeva commented 6 years ago

@johnbachman: I am relatively new on the project, so I apologize if my question does not make much sense. Below is the output we get from the system*--tyrosine is extracted as 'site' and 'IRS-1' as 'theme'. What would your desired output look like? Did you mean that 'IRS-1' should be extracted as 'site' instead of 'tyrosine', that 'tyrosine' should come as modification to a protein, or maybe some other option? Thank you in advance.

*just the relevant part of one of your examples to make the relations in question easier to see

johnbachman commented 6 years ago

@maxaalexeeva Actually, now that you point it out, that particular sentence is fairly ambiguous (and poorly written). "Tyrosine phosphorylation of IRS1 and pSer 473 AKT" could, in principle, indicate "tyrosine phosphorylation of IRS1" and "tyrosine phosphorylation of pSer 473 AKT". However, "pSer 473 AKT" refers (in abbreviated form) to "AKT phosphorylated at serine 473" (i.e., not tyrosine). So the interpretation should actually be that IGF-1 induces two distinct things: phosphorylation of IRS-1 at tyrosine and phosphorylation of AKT at serine 473.

Looking at the output, I can see that there is a phosphorylation event with IRS-1 as the theme and tyrosine as the site, which is correct. However, there is also a phosphorylation event stemming from the same "phosphorylation" word with "AKT" as the theme, which (in this particular case) is not correct. However, I can definitely see now that this sentence is linguistically quite ambiguous!

The third sentence is more clear:

Insulin stimulated rapid tyrosine phosphorylation of IR, phosphorylation of the PI3K effector Akt at Ser473...

IR (insulin receptor) is phosphorylated at tyrosine, whereas AKT is phosphorylated at Serine 473.

Here's another simpler example from the set of 44 sentences:

Insulin stimulated tyrosine phosphorylation of IRS-1, PI 3-kinase activity, and Akt phosphorylation.

The "tyrosine" applies only to the phosphorylation of IRS-1 here, not to the phosphorylation of AKT.

Btw I just realized that the JSON txt file I attached above was not pretty printed, which makes it really difficult to manually scan for sentences. I've attached a new version so it's easier to review.

reach_ins_akt_phos_tyrosine_errors_pretty_print.txt

MihaiSurdeanu commented 6 years ago

Thanks @johnbachman!

@maxaalexeeva: can you please paste here the outputs for the latter 2 sentences sent by John? Also, can you please book a 1-hour slot for next week (separate from the regular ASDF meeting), so we can plan how to address these issues? For this meeting, I would like @maxaalexeeva, @zhengzhongliang, and ideally @enoriega (if you can join remotely) to participate. Please find a slot that works for the 3 of you. Thanks!

@johnbachman: it is possible that there are 2 issues at play here:

We don't do a good job solving conflicting sites. This should be fixed.
Bad syntax. If this is driving some of the errors, I would wait on these issues until we switch to the new dependency representation and the new parser, and revisit them. This should happen in ~1 week.

maxaalexeeva commented 6 years ago

Here they are (run on master):

MihaiSurdeanu commented 5 years ago

Thanks @maxaalexeeva!

Let's look at the second figure, it's simpler. Further, let's call the first instance of "phosphorylation" "p1", and the second instance "p2". Then, the following relations are incorrect: p1-PI, p1-Akt, p2-IRS-1, and p2-PI

Can you please see which rules create these extractions? These rules should be constrained to not cross commas in enumerations of events.

maxaalexeeva commented 5 years ago

@MihaiSurdeanu those rules are syntax rules (syntax_1a_noun and syntax_4_noun).

Here's the dep. parse for the second figure:

My thinking was to limit what words hops can go through, e.g., we could prevent p1 relating to Akt through putting a constraint along these lines: (?! nmod_of [word=phosphorylation] )

Your suggestion makes more sense, but I haven't found a way to write that constraint yet.

Also, maybe this can be fixed through some sort of mention filter? E.g., if the entity has been previously found as a theme for some other simple event, filter this event out?

maxaalexeeva commented 5 years ago

Does this look more like what we want?

That is from trying to solve it the first way I suggested (still haven't found a way to put a constraint on commas). Still need to try to make it more generalizable by not limiting the constraint to just the word 'phosphorylation' and see how it influenced the other tests. For now, all the changes are local and on my branch.

MihaiSurdeanu commented 5 years ago

@maxaalexeeva: this is much better! What exactly did you do?

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, in the syntax rules I mentioned, I basically disallowed hops if they involve a relation of this kind: 'phosphorylation of phosphorylation' (?! <nmod_of [word=phosphorylation]).

I also adjusted a token rule to avoid 'phosphorylation of IRS-1 and Akt' to be extracted as an event in a hypothetical scenario like this:

Here are all the changes I made so far:

diff --git a/main/src/main/resources/org/clulab/reach/biogrammar/events/simple-event_template.yml b/main/src/main/resources/org/clulab/reach/biogrammar/events/simple-event_template.yml
index 66f1a12..01c6384 100644
--- a/main/src/main/resources/org/clulab/reach/biogrammar/events/simple-event_template.yml
+++ b/main/src/main/resources/org/clulab/reach/biogrammar/events/simple-event_template.yml
@@ -186,7 +186,7 @@
   pattern: |
     trigger = [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix}]
     cause:BioChemicalEntity? = ((<dobj|<'nmod:npmod')? (nmod_by|nmod_agent|"nsubj:xsubj") | compound | nmod_of nmod_by) /compound|conj_(and|or|nor)/{,2} | "nmod:poss"
-    theme:BioChemicalEntity = nmod_of appos? /compound|conj_(and|or|nor)/{,2}
+    theme:BioChemicalEntity = (?! nmod_of [word=phosphorylation]) nmod_of appos? /compound|conj_(and|or|nor)/{,2}
     site:Site? = (/nmod_/? compound{,2}){1,2}

@@ -243,7 +243,7 @@
   action: ${ actionFlow }
   pattern: |
     trigger = [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix} & !outgoing=/nmod_(by|of)/]
-    theme:BioChemicalEntity = </conj_(and|or|nor)/? /conj_(and|or|nor)|compound|nmod_of/{,2}
+    theme:BioChemicalEntity = (?! <nmod_of [word=phosphorylation]) </conj_(and|or|nor)/? /conj_(and|or|nor)|compound|nmod_of/{,2}
     site:Site? = compound | <dobj? /nmod_(at|on)/ nummod?

@@ -303,7 +303,7 @@
   label: ${ label }
   action: ${ actionFlow }
   pattern: |
-    (?<trigger> [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix}]) /of/ [!tag=/^(V|JJ)/]{,2}? @theme:BioChemicalEntity (/^(on|at)$/ @site:Site)?
+    @site:Site? (?<trigger> [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix}]) /of/ [!tag=/^(V|JJ)/]{,2}? @theme:BioChemicalEntity (/^(on|at)$/ @site:Site)?

 # verbose, but nec. to handle coordination
@@ -314,7 +314,7 @@
   label: ${ label }
   action: ${ actionFlow }
   pattern: |
-    (?<trigger> [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix}]) /of/ [!tag=/^(V|JJ)/]{,2}? @BioChemicalEntity [tag="CC"] @theme:BioChemicalEntity
+    (?<trigger> [lemma=/${ nominalTriggerLemma }/ & ${triggerPrefix}]) /of/ [!tag=/^(V|JJ)/]{,2}? @BioChemicalEntity [tag="CC"] @theme:BioChemicalEntity (?! [lemma=/${ nominalTriggerLemma }/])

I have rerun all the event tests, and there are no new tests failing. If the above is okay with you, similar changes will need to be made in other rules--now that these rules are fixed, incorrect events are extracted with different rules in other sentences.

MihaiSurdeanu commented 5 years ago

@maxaalexeeva: this is a great intuition. I think we'll have to generalize this into a mention filter that is executed after the simple event grammar. This filter will disable any simple events that contain the trigger of another simple event in the syntactic dependency path between its own trigger and the theme. This has multiple advantaged:

It works on any syntactic dependencies, not just nmod_of
It is implemented in one place that applies over all simple events. If we do this at rule level, we would have to modify all grammar rules...
It applies to all combination of simple events not just "phosphorylation of phosphorylation". For example, "phosphorylation of ubiquitination" is also incorrect.

What do you think? Is this clear? Do you agree?

MihaiSurdeanu commented 5 years ago

@maxaalexeeva agreed to work on a mention filter for this issue.

maxaalexeeva commented 5 years ago

I need help writing the filter. Will anyone be available Monday between 11 and 4:30? @MihaiSurdeanu @marcovzla @enoriega.

Alternatively, does anyone have tips on how to approach this? I've been trying to use .paths (like these: https://github.com/clulab/reach/blob/afff6776cf2449c6de1c16c79776a499e1f06dc6/main/src/main/scala/org/clulab/reach/mentions/CorefMention.scala#L118), but I don't know how to filter mentions based on the information that is so deep in a nested map. And this might not be the best way.

MihaiSurdeanu commented 5 years ago

If @marcovzla and @enoriega can't make it, can you pick a 30 min slot Monday from my meetings?

enoriega commented 5 years ago

I won't be at campus but we can zoom at 3:00 for this. Does that work for you @maxaalexeeva ?

maxaalexeeva commented 5 years ago

@enoriega it does! Thank you!

maxaalexeeva commented 5 years ago

A solution suggested by @myedibleenso to avoid the check being too expensive (the steps below are a direct quote):

Find any intersecting args + triggers for competing events.

Grab SynPaths corresponding to any intersecting args (the .paths attribute).

Check if any SynPaths (note: these are not necessarily shortest paths) pass through one of the trigger tokens for any competing events.

MihaiSurdeanu commented 5 years ago

This makes sense!

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, that's a lot of information below, but I could use some feedback.

@bsharpataz wrote the bulk of the filter for me. See here: https://github.com/clulab/reach/blob/e5e11dc1a8534e5eb0d018759ce66ad9265c7e95/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L27

I finally managed to get it to run without breaking any tests. I call it here: https://github.com/clulab/reach/blob/4b59961dccbffaabb9023aec0dd91db6b30664ed/main/src/main/scala/org/clulab/reach/ReachSystem.scala#L106

The issue with that version of the filter was that it only filtered out mentions based on the shortest path and that is not sufficient to filter out all the erroneous mentions. In 1a and 1b, you can see that it filtered out the relation between P1 (the first instance of phosphorylation) and Akt (compare with the original 2a and 2b; the dependencies are shown in 4a and 4b).

I discussed the possibility of checking the whole path with @myedibleenso, and he seemed to think (please, correct me if I am wrong) that checking the whole path is expensive. I tried a few other things and the one that looked promising was checking the incoming relations to the overlapping argument --- filter out the mentions in which there is an incoming edge connecting the argument in question to the trigger of another mention (here: https://github.com/clulab/reach/blob/cfbb46316b856040977985181a55d732c4c96afc/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L72). You can see the result of applying that filter in 3a and 3b.

Several apparent issues with these filters:

some relations are still not taken care of (e.g., PI in 1/2/3a -- in this example, activity is not a trigger, so the filter is not applied; there are probably more examples like this)
(only for the second version of the filter) three tests got broken: two translocation and one transcription (see 5 and 6; a - broken, b - original). I could try applying some constraint on the filter, e.g., excluding appositives or translocation and transcription events, etc?
checking only the closest dependency neighbors of the argument is still not enough (see the dependencies for 3b in 4b, P2--insulin receptor: insulin receptor is not connected to P2 with an incoming node).

Actual questions:

Do these filters make sense? Since all these checks look expensive, should we revisit just putting constraints on the rules?
The first one eliminates at least some incorrect relations; do we want it in master?
Does putting constraints on the second one to prevent the three tests breaking make sense? (described above) Any other suggestions?
Should I try to check all the paths, not only the shortest? (might break more test and expensive, but might be worth checking?)

The results of applying the filter that only checks the shortest path---it only filters out P1 -- Akt: a. b.
The relations produced by the system in the master branch: a.

b.
The results after applying the filter that checks if there is an incoming edge connecting the argument in question to the trigger of another mention---it filters out P2 -- IRS in both cases: a. b.
Dependencies for examples a and b in 1-3:

a. b.
Broken and correct (based on master) transcription test:

a.

b.
Broken and correct (based on master) translocation test: a. b.

MihaiSurdeanu commented 5 years ago

Thanks @maxaalexeeva and @bsharpataz!

the code looks nice! But can you please comment it better? Minimally, it would be nice to add an example of extractions that filterOverlappingMentions fixes. Also, this line:

https://github.com/clulab/reach/blob/e5e11dc1a8534e5eb0d018759ce66ad9265c7e95/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L33

should be optimized. For example, it's simpler to just pass ms to getOverlappingMentions, and, in getOverlappingMentions add to the filter that m != _ (to skip itself)

I think this filter will not be perfect but it is already useful!
I agree with adding some constraints to the filter, e.g., ignoring translocation, transcription, amount events makes sense (unless the original event is a translocation, transcription, amount; that is, never allow the overlap of events of same label). This would fix tests 5 and 6. Not sure about appositives... Do you have an example?
This filter is still much better and more generic than modifying rules, so I would say to try to improve this as much as possible and not modify the rules.
I think you should check all the paths. But you should be careful about it, to keep it efficient. That is, I recommend you build a single HashSet from the positions covered by all paths, and check that the other trigger is in there. But wait, isn't this what you're doing in this line:

https://github.com/clulab/reach/blob/e5e11dc1a8534e5eb0d018759ce66ad9265c7e95/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L69

If so, I am not sure what the question is. Are you saying that m.paths("theme").get(theme) gets only the shortest paths? If so, I think we should leave it as is. Searching for all paths is expensive...

maxaalexeeva commented 5 years ago

But can you please comment it better? - will do.

Line 33 -- done. I realized that this line I added for something else already makes sure a mention is not compared to itself: https://github.com/clulab/reach/blob/f520fc2d9efb154b6a02e4e4db0909883094f0ab/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L53

I agree with adding some constraints to the filter - I will try that.

Not sure about appositives... Do you have an example? - in example 6, protein is part of an appositive to ASPP2 (see the dependency in 6a), i.e., it's part of a phrase that provides additional identifying information. In the second version of the filter, I check if the argument is connected to the trigger of a different event with an incoming edge and filter the mention out if that's true (just a heuristic). However, with appositives, the argument in question (e.g., ASPP2 in 6) will in many cases be correctly connected to two triggers: its own trigger and the one that the appositive (protein) gets since they refer to the same entity (ASPP2 = protein); this will trigger the filter unnecessarily. Does that make sense? I will want to look if this holds on more examples and if it will make sense to add a constraint based on that.

Are you saying that m.paths("theme").get(theme) gets only the shortest paths? If so, I think we should leave it as is. It looks like it only gets the shortest path. I'll check again. It might be originating here? https://github.com/clulab/reach/blob/afff6776cf2449c6de1c16c79776a499e1f06dc6/assembly/src/main/scala/org/clulab/reach/assembly/relations/classifier/PathFinder.scala#L47

MihaiSurdeanu commented 5 years ago

On Jan 2, 2019, at 9:10 PM, Maria (Masha) Alexeeva notifications@github.com wrote:

But can you please comment it better? - will do.

Line 33 -- done. I realized that this line I added for something else already makes sure a mention is not compared to itself: https://github.com/clulab/reach/blob/f520fc2d9efb154b6a02e4e4db0909883094f0ab/main/src/main/scala/org/clulab/reach/darpa/MentionFilter.scala#L53

Ok.

I agree with adding some constraints to the filter - I will try that.

Not sure about appositives... Do you have an example? - in example 6, protein is part of an appositive to ASPP2 (see the dependency in 6a), i.e., it's part of a phrase that provides additional identifying information. In the second version of the filter, I check if the argument is connected to the trigger of a different event with an incoming edge and filter the mention out if that's true (just a heuristic). However, with appositives, the argument in question (e.g., ASPP2 in 6) will in many cases be correctly connected to two triggers: its own trigger and the one that the appositive (protein) gets since they refer to the same entity (ASPP2 = protein); this will trigger the filter unnecessarily. Does that make sense? I will want to look if this holds on more examples and if it will make sense to add a constraint based on that.

I agree. Good catch.

Are you saying that m.paths("theme").get(theme) gets only the shortest paths? If so, I think we should leave it as is. It looks like it only gets the shortest path. I'll check again. It might be originating here? https://github.com/clulab/reach/blob/afff6776cf2449c6de1c16c79776a499e1f06dc6/assembly/src/main/scala/org/clulab/reach/assembly/relations/classifier/PathFinder.scala#L47

Yep. I propose to leave it as is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, I am running the incorrect mention examples that John provided. It seems that in many cases, the filter is doing a reasonable job---it does not remove all of the incorrect relations, but it does some. I can't always be sure the result what it should be, but there is this one example that is clearly wrong (the parse is wrong and that interferes with the logic of the filter). Any suggestions? Maybe some sort of constraint?

Also, above John said this:

Actually, now that you point it out, that particular sentence is fairly ambiguous (and poorly written). "Tyrosine phosphorylation of IRS1 and pSer 473 AKT" could, in principle, indicate "tyrosine phosphorylation of IRS1" and "tyrosine phosphorylation of pSer 473 AKT". However, "pSer 473 AKT" refers (in abbreviated form) to "AKT phosphorylated at serine 473" (i.e., not tyrosine).

Is there some place in the system where pSer 473 AKT can be de-abbreviated? If it is stays in the abbreviated form, the rules won't handle it correctly, will they?

MihaiSurdeanu commented 5 years ago

Thanks @maxaalexeeva!

My brain can't handle that many regulations at once... Which ones do you think are incorrect? I think most of those are fine. IL-1alpha indeed controls the two phosphorylations mentioned later in the sentence, which would lead to many regulations.

Also, John's comments does not seem to be about this text, no?

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, sorry, I should have been more specific. The incorrect rel that I meant was IRS-1 being attached to phosphorylation2, but not phosphorylation1. The parser basically does what I tried to avoid with the filter -- it attaches an argument (IRS-1) by going through a different phosphorylation.

Maybe we can have some sort of token rule/action for 'of' (only for cases when it is right after the trigger to avoid other rels being broken) to take precedence over what parse does?

The example comes from the set of 44 examples that John attached in one of his previous messages on this thread.

MihaiSurdeanu commented 5 years ago

Hmm... This clearly a broken parse. I wonder if we need to add a surface rule like: SITE simple_event_trigger of PROTEIN? Can you please look into this?

In any case, this comment is independent of your filter, which I think is good and should be merged. Anything stopping us from merging this in master?

maxaalexeeva commented 5 years ago

@MihaiSurdeanu the tests seem to be passing locally with the filter. I do not think the filter should cause any major issues, but we will hear from the end users if it does and can take it out, right? I will start a pull request.

I will look into the surface rule.

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, I looked into the possibility of using a surface rule to help with the broken parse (we discussed that a few messages up, but here is what the wrong output looks like---notice how IRS-1 is the theme to phosphorylation2 here instead of to phosphorylation1):

This token rule that I wrote based on your suggestion takes care of the incorrect parse, but only if its priority is higher than the priority of all the other simple events in simple-events_template.yml:

to produce this output (IRS-1 is attached as a theme to phosphorylation1):

If it's okay to use this rule, what would be the best way to change the priority? I see three options (all of which don't sound too good to me but I can't think of anything else):

1) hardcode the priority in the rule 2) create another variable (something like the ${ edgeCasePriority } in the rule screenshot above) and add it to the list of variables in events_master.yml like this: 3) put this rule in a separate file; that will mean that simple event entries in events_master.yml will have to have duplicates to import rules from two different files and with different priorities, right? Here's an example of what I think that would look like:

https://github.com/clulab/reach/blob/4b59961dccbffaabb9023aec0dd91db6b30664ed/main/src/main/resources/org/clulab/reach/biogrammar/events_master.yml#L54

Is there a better way to change priority of one rule?

BeckySharp commented 5 years ago

Hey -- why would the priority matter here? i.e., I completely believe you about what's happening, but why? It could be the case that the fix lies elsewhere...?

On Mon, Feb 4, 2019 at 11:08 PM Maria (Masha) Alexeeva < notifications@github.com> wrote:

@MihaiSurdeanu https://github.com/MihaiSurdeanu, I looked into the possibility of using a surface rule to help with the broken parse (we discussed that a few messages up, but here is what the wrong output looks like---notice how IRS-1 is the theme to phosphorylation2 here instead of to phosphorylation1):

[image: image] https://user-images.githubusercontent.com/31713912/52253598-48b57d00-28c6-11e9-88cd-440e61bc38e6.png

This token rule that I wrote based on your suggestion takes care of the incorrect parse, but only if its priority is higher than the priority of all the other simple events in simple-events_template.yml: [image: image] https://user-images.githubusercontent.com/31713912/52253670-a0ec7f00-28c6-11e9-862c-175bc9fccc7d.png

to produce this output (IRS-1 is attached as a theme to phosphorylation1):

[image: image] https://user-images.githubusercontent.com/31713912/52253679-b8c40300-28c6-11e9-9685-cab0b3b3dbee.png

If it's okay to use this rule, what would be the best way to change the priority? I see three options (all of which don't sound too good to me but I can't think of anything else):

hardcode the priority in the rule

create another variable (something like the ${ edgeCasePriority } in the rule screenshot above) and add it to the list of variables in events_master.yml like this: [image: image] https://user-images.githubusercontent.com/31713912/52254207-a9928480-28c9-11e9-9dea-3e76c49d868b.png

put this rule in a separate file; that will mean that simple event entries in events_master.yml will have to have duplicates to import rules from two different files and with different priorities, right? Here's an example of what I think that would look like:

https://github.com/clulab/reach/blob/4b59961dccbffaabb9023aec0dd91db6b30664ed/main/src/main/resources/org/clulab/reach/biogrammar/events_master.yml#L54

[image: image] https://user-images.githubusercontent.com/31713912/52255780-27a65980-28d1-11e9-9d31-07d8078c83f4.png

Is there a better way to change priority of one rule?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clulab/reach/issues/605#issuecomment-460524628, or mute the thread https://github.com/notifications/unsubscribe-auth/AFIniUPICaL9hF2xINNS2EeQduWkCTOOks5vKR_RgaJpZM4YW84U .

MihaiSurdeanu commented 5 years ago

@bsharpataz: I suspect the priority is needed because we do not allows overlapping events. In the regular priority layer, parts of this correct event are captured for other things, which prevent the correct rule to create the right event.

@maxaalexeeva: my preference is to create a new grammar file with edge cases for simple events. Say edge_cases_simple_events.yml. This one will have only edge-case rules, all of them running with the higher priority. These rules would apply to all simple events, so you can reuse the same variables from the master file.

maxaalexeeva commented 5 years ago

@bsharpataz, @MihaiSurdeanu, thank you both for the responses. I will deal with this in the next couple of days.

BeckySharp commented 5 years ago

@mihai -- I know we don't allow overlapping mentions, but priority wouldn't affect that, depending on how you filter (i.e. are you using a keepLongest / keep most complete type of method?). That would be overriding if it were simply an overlap issue, right? (I could be misunderstanding the situation)

On Tue, Feb 5, 2019 at 10:03 AM Maria (Masha) Alexeeva < notifications@github.com> wrote:

@bsharpataz https://github.com/bsharpataz, @MihaiSurdeanu https://github.com/MihaiSurdeanu, thank you both for the responses. I will deal with this in the next couple of days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clulab/reach/issues/605#issuecomment-460718496, or mute the thread https://github.com/notifications/unsubscribe-auth/AFIniWeB4Mdz3Hh1a0pCJfSynM2O5nJpks5vKblagaJpZM4YW84U .

MihaiSurdeanu commented 5 years ago

Yeah, I remember that the order in which the mentions were created matters. Something along the lines of allow only certain overlaps with existing mentions.

I think the keepLongest heuristic applies for conflicting mentions generated in the same layer.

clulab / reach

Phosphorylation picking up residue information from from wrong protein #605