clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

Modifier after entity name missed in activation rule #325

Open bgyori opened 8 years ago

bgyori commented 8 years ago

For the sentence The consequences of increased AR function might then increase docetaxel resistance via increasing p21 expression., REACH extracts an event stating that AR positively activates docetaxel. The key issue here is that the object of the statement is docetaxel resistance and not docetaxel. To simplify the example, I tried the sentence AR increases docetaxel resistance. which yields the same extracted event.

bgyori commented 8 years ago

Along similar lines, when a molecular entity is mentioned in relation to a cell type, it is often extracted as the actual subject/object of an event. For instance, consider the synthetic example: BRAF inhibition in NF1 mutant cells causes apoptosis. from which BRAF negatively activates NF1 is extracted. Here the statement is about NF1 mutant cells and not NF1 itself.

An actual example from a paper is Combination of FGFR and AKT inhibition in an FGFR2 mutated endometrial cancer xenograft model [...], which yields two negative activation events: FGFR negatively activates FGFR2 and AKT negatively activates FGFR2.

MihaiSurdeanu commented 8 years ago

@bgyori: so, on the first example, should we extract a negative activation between AR and docetaxel?

bgyori commented 8 years ago

No, the simple solution would be to not extract anything at all. Figuring out what docetaxel resistance is would be beyond the capabilities of the system so it's better to leave it out. For the NF1 mutant cells example, in principle, NF1 mutant cells could be extracted as context on the actual process that the sentence talks about but again it might be hard to do and it would be better not to extract anything for NF1.

johnbachman commented 8 years ago

Actually, for the first sentence, a legitimate extraction would be "AR increases p21 expression."

bgyori commented 8 years ago

Yes, John is right.

herongrove commented 8 years ago

Possible approaches:

  1. Add contextual restrictions to the entity rules so we don't produce mentions for them, e.g. with negative lookaheads for the word 'resistance'. Potentially hurts coreference resolution.
  2. Edit the event rules by providing either positive (allowed) words or negative (disallowed) words when the syntactic path includes an nn link. A disadvantage of this is that we'd have to make this edit everywhere -- in all rules that have an optional nn before the theme, which is most of them.
  3. After the mentions are generated, inspect the arguments of all event mentions recursively to look for these constructions. This is probably not a good solution.
myedibleenso commented 8 years ago

I like (1), but there is some small risk it could hamper the coref module's hunt for antecedent. Generally, though, I don't think we care about entity mentions unless they happen to be participating in events.

(2) could be handled as a variable defining a lookaround constraint on certain args of events, but this will complicate the templates a bit...

bgyori commented 8 years ago

Some more examples that result in incorrect negative activation events:

MihaiSurdeanu commented 7 years ago

@myedibleenso, @danebell: it looks like the actionable item here is to add 1 (or 2) grammar rules to capture: "NF1 mutant cells" and "Brca1 deficient cells" and "BRAF V600E -positive melanoma cells" as CELL_TYPE. Thus disabling events on the proteins included in these entities. @myedibleenso, @danebell: please decide between the two of you who should add these rules.

bgyori commented 7 years ago

Foud another one: Treatment of AI resistant cell lines with LBH589 repressed NF-kappaB1 mRNA and protein expression Here AI shouldn't be extracted as negatively activating NF-kappaB1.

bgyori commented 7 years ago

I found another one that is not about cell types but has a similar pattern in which a modifier is missed after the entity name: a new mechanism of oxygen independent activation of HIF-1 has been identified from which oxygen activates HIF-1 was extracted and independent was ignored.

MihaiSurdeanu commented 7 years ago

@bgyori: syntactically, this looks similar but it has to be handled differently in our system. This fits under code that detects the polarity of activations/regulations. Except, in this case, if the activation is independent of the controller, we should discard the interactions. I will start a separate issue for this.

@myedibleenso: please add rules for "* cell"?

bgyori commented 7 years ago

In the evaluation results we bump into incorrect explanations due to a frequently occurring error related to this issue. Consider these examples:

19244107 This is supported by a previous study showing that haplodeficiency of Akt1 dramatically inhibits prostate tumor development in Pten +/- mice (XREF_BIBR). 23786676 Akt1 and 2 deficiency is sufficient to markedly reduce the incidence of tumors in Pten (+/-) mice [XREF_BIBR] and Myc also cooperates with Akt1 in promoting prostate tumorigenesis [XREF_BIBR]. 20231902 Moreover, knockdown of Akt1 induces MST2 activation and enhances doxorubicin activated MST2 and apoptosis in PTEN mutated MDA-MB-468 cells (XREF_FIG). 19369943 PI3K and Akt signalling is also essential for oncogenic ErbB-2-induced transformation (XREF_BIBR), and Akt1 deficiency sufficiently suppresses tumour development in PTEN +/- mice (XREF_BIBR).

In each case AKT1 inhibits PTEN was extracted even though the sentences talks about a more complex process (e.g. tumor development) and only mentions PTEN as a contextual variable.

MihaiSurdeanu commented 7 years ago

Thanks @bgyori! Unfortunately, these are multiple issues showing for the same interaction... We are discussing these.

myedibleenso commented 7 years ago

The consensus at this point seems to be to try @danebell's possible approaches 1 (modifying the entity rules).