clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

binding_17 extractions of "associated with" #335

Open bgyori opened 8 years ago

bgyori commented 8 years ago

The binding_17 rule extracts complex formation events for A is associated with B sentences. However, is associated with seems to be usually used to draw an observational, statistical link between "distant" molecular players or higher level processes - things like Smoking is associated with lung cancer. The associated with pattern accounts for a lot of the extractions from binding_17 (~50%) and it turns out that these sentences overwhelmingly talk about associations that are not related to complex formation based on the samples we looked at.

Here are some examples of failure modes:

Finally, an example where associated with actually means complex formation (at least as far as I understand):

MihaiSurdeanu commented 8 years ago

@bgyori: we know "associated with" is ambiguous, but we're not sure how to disambiguate it. Any simple ideas?

bgyori commented 8 years ago

I have some ideas:

  1. In a statistical sense simply removing this case of the rule would be a net improvement.
  2. I would classify A associates with B as likely to refer to a physical interaction between two molecular entities and A is/was associated with B as unlikely to refer to a physical interaction.
  3. Is named entity recognition and grounding able to influence what is ultimately extracted? If A and B are recognized as molecular entities (that is, proteins, protein families, complexes, chemicals) then extracting the complex formation event is warranted, otherwise the event should not be extracted.
MihaiSurdeanu commented 8 years ago

3 is a very good catch! We should do this! @myedibleenso @marcovzla: can you please make sure that Binding rules operate only proteins, families, or chemicals?

MihaiSurdeanu commented 8 years ago

@myedibleenso, @marcovzla: if the above idea doesn't work, let's go brute force and simply remove "associate" from the list of triggers. It seems too noisy.

MihaiSurdeanu commented 7 years ago

Another weird example for "associated with": Yields a binding event between TGF-beta and PAK2, even though the sentence doesn't support that::

[Evidence(reach, PMID22880011, {'cell_type': [u'cl:CL:0008019'], 'organ': None, 'tissue': None, 'location': None, 'cell_line': None, 'species': [u'taxonomy:4932']}, Pak2 is specifically activated by TGF-beta only in mesenchymal cells, as the result of phosphatidylinositol 3-kinase (PI3K) activation and may be associated with TGF-beta activation of Ras XREF_BIBR, XREF_BIBR, XREF_BIBR.)]

MihaiSurdeanu commented 7 years ago

@MihaiSurdeanu will try to clean binding_17 next.