clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

Differences in processing between Arizona/Odinweb and local installation #766

Closed zebulon2 closed 2 years ago

zebulon2 commented 2 years ago

Hi, I notice I get different results using the Web API service on http://agathon.sista.arizona.edu:8080/odinweb/api/text and a local installation of REACH. I am wondering if it is a difference in code or settings. Notably, I used this test sentence: The ribosomal S6 protein phosphorylation.

On the odinweb REACH service API, I get this response: "events" : { "frames" : [ { "frame-id" : "evem-api946-UAZ-r1-Reach-0-6975", "text" : "S6 protein phosphorylation", "arguments" : [ { "text" : "S6", "argument-type" : "entity", "type" : "theme", "object-type" : "argument", "index" : 0, "arg" : "ment-api946-UAZ-r1-Reach-0-45668" } ], "type" : "protein-modification", "frame-type" : "event-mention", "subtype" : "phosphorylation", "is-direct" : false, "end-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 40, "object-type" : "relative-pos" }, "trigger" : "phosphorylation", "object-type" : "frame", "start-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 14, "object-type" : "relative-pos" }, "sentence" : "sent-api946-UAZ-r1-Reach-0", "found-by" : "Phosphorylation_syntax_4_noun", "verbose-text" : "The ribosomal S6 protein phosphorylation." 46 13154 132 } ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:49:30Z", "doc-id" : "api946", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:49:30Z", "object-type" : "meta-info", "organization" : "UAZ" } }, "entities" : { "frames" : [ { "text" : "S6", "frame-id" : "ment-api946-UAZ-r1-Reach-0-45668", "type" : "family", "frame-type" : "entity-mention", "end-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 16, "object-type" : "relative-pos" }, "xrefs" : [ { "namespace" : "interpro", "species" : "human", "object-type" : "db-reference", "id" : "IPR000529" } ], "object-type" : "frame", "start-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 14, "object-type" : "relative-pos" }, "sentence" : "sent-api946-UAZ-r1-Reach-0", "alt-xrefs" : [ { "namespace" : "interpro", "species" : "human", "object-type" : "db-reference", "id" : "IPR000529" }, { "namespace" : "interpro", "species" : "caenorhabditis elegans", "object-type" : "db-reference", "id" : "IPR000529" }, { "namespace" : "interpro", "species" : "fruitfly", "object-type" : "db-reference", "id" : "IPR000529" }, { "namespace" : "interpro", "species" : "mouse", "object-type" : "db-reference", "id" : "IPR000529" }, { "namespace" : "interpro", "species" : "saccharomyces cerevisiae", "object-type" : "db-reference", "id" : "IPR000529" } ] } ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:49:30Z", "doc-id" : "api946", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:49:30Z", "object-type" : "meta-info", "organization" : "UAZ" } }, "sentences" : { "frames" : [ { "text" : "The ribosomal S6 protein phosphorylation.", "frame-id" : "pass-api946-UAZ-r1-Reach", "section-id" : "NoSection", "frame-type" : "passage", "is-title" : false, "section-name" : "NoSection", "object-type" : "frame", "index" : "Reach", "object-meta" : { "component" : "nxml2fries", "object-type" : "meta-info" } }, { "text" : "The ribosomal S6 protein phosphorylation .", "frame-id" : "sent-api946-UAZ-r1-Reach-0", "passage" : "pass-api946-UAZ-r1-Reach", "frame-type" : "sentence", "end-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 41, "object-type" : "relative-pos" }, "object-type" : "frame", "start-pos" : { "reference" : "pass-api946-UAZ-r1-Reach", "offset" : 0, "object-type" : "relative-pos" }, "object-meta" : { "component" : "BioNLPProcessor", "object-type" : "meta-info" } } ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:49:30Z", "doc-id" : "api946", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:49:30Z", "object-type" : "meta-info", "organization" : "UAZ" } } }

whereas using a local REACH server (installed from the GIT repo) I get this answer: { "events" : { "frames" : [ ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:51:52Z", "doc-id" : "api24", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:51:52Z", "object-type" : "meta-info", "organization" : "UAZ" } }, "entities" : { "frames" : [ { "text" : "S6", "frame-id" : "ment-api24-UAZ-r1-Reach-0-63", "type" : "site", "frame-type" : "entity-mention", "end-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 16, "object-type" : "relative-pos" }, "xrefs" : [ { "namespace" : "uaz", "object-type" : "db-reference", "id" : "UAZ5336" } ], "object-type" : "frame", "start-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 14, "object-type" : "relative-pos" }, "sentence" : "sent-api24-UAZ-r1-Reach-0" }, { "text" : "protein phosphorylation", "frame-id" : "ment-api24-UAZ-r1-Reach-0-64", "type" : "bioprocess", "frame-type" : "entity-mention", "end-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 40, "object-type" : "relative-pos" }, "xrefs" : [ { "namespace" : "go", "species" : "human", "object-type" : "db-reference", "id" : "GO:0006468" } ], "object-type" : "frame", "start-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 17, "object-type" : "relative-pos" }, "sentence" : "sent-api24-UAZ-r1-Reach-0", "alt-xrefs" : [ { "namespace" : "go", "species" : "human", "object-type" : "db-reference", "id" : "GO:0006468" } ] } ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:51:52Z", "doc-id" : "api24", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:51:52Z", "object-type" : "meta-info", "organization" : "UAZ" } }, "sentences" : { "frames" : [ { "text" : "The ribosomal S6 protein phosphorylation.", "frame-id" : "pass-api24-UAZ-r1-Reach", "section-id" : "NoSection", "frame-type" : "passage", "is-title" : false, "section-name" : "NoSection", "object-type" : "frame", "index" : "Reach", "object-meta" : { "component" : "nxml2fries", "object-type" : "meta-info" } }, { "text" : "The ribosomal S6 protein phosphorylation .", "frame-id" : "sent-api24-UAZ-r1-Reach-0", "passage" : "pass-api24-UAZ-r1-Reach", "frame-type" : "sentence", "end-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 41, "object-type" : "relative-pos" }, "object-type" : "frame", "start-pos" : { "reference" : "pass-api24-UAZ-r1-Reach", "offset" : 0, "object-type" : "relative-pos" }, "object-meta" : { "component" : "BioNLPProcessor", "object-type" : "meta-info" } } ], "object-type" : "frame-collection", "object-meta" : { "processing-end" : "2022-02-09T09:51:52Z", "doc-id" : "api24", "component-type" : "machine", "component" : "Reach", "processing-start" : "2022-02-09T09:51:52Z", "object-type" : "meta-info", "organization" : "UAZ" } } }

The part I would like to point out is the interpretation of the word "S6". On the Odinweb service, it correctly recognizes it as an entity. On the local server it is recognized as a "site" but not as a protein. There are also differences in recognizing the processes. On the local server I need to use "pS6" instead of "S6" to have it recognized as a protein. Am I doing anything wrong? Or am I missing some files? thanks a lot in advance for your suggestions.

MihaiSurdeanu commented 2 years ago

Hi @zebulon2, Thanks for the report! Unfortunately, the web demo install uses an old version of Reach, considerably older than the master branch. Since then, we changed many things, including extending the knowledge bases used. One undesired side effect of this is increased ambiguity for tokens with multiple types, which you observed here.

Just to be extra safe, @enoriega: can you please double check this example using your local Reach?

enoriega commented 2 years ago

@MihaiSurdeanu @zebulon2

I replicated the extraction. Indeed S6 is detected as as site:

MENTION TEXT:  S6
LABELS:        List(Site)
DISPLAY LABEL: Site
    ------------------------------
    RULE => site_1letter_a
    TYPE => CorefTextBoundMention
    ------------------------------
    GROUNDING: <KBResolution: S6, uaz, UAZ5336, , <IMKBMetaInfo: uaz, , , , sp=false, f=false, p=false>>

CONTEXT: NONE
    ------------------------------

Thanks for the report. We hope to extend the NER to handler ambiguities like this better soon.