clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

Context metadata in FRIES output #757

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

Edited the FriesOutput class to add the frequencies metadata as part of the context frames.

An example of how it is represented in JSON:

{
    "frame-id" : "cntx-physiology-UAZ-r1-8164046-149-38",
    "frame-type" : "context",
    "scope" : "sent-physiology-UAZ-r1-8164046-149",
    "object-type" : "frame",
    "facets" : {
      "organism" : [ "taxonomy:10090" ],
      "location" : [ "uniprot:SL-0162" ],
      "cell-type" : [ "cl:CL:0000233" ],
      "object-type" : "facet-set",
      "freqs" : {
        "taxonomy:10090" : "default",
        "cl:CL:0000233" : {
          "2" : "1",
          "1" : "1",
          "0" : "1"
        },
        "uniprot:SL-0162" : {
          "2" : "1",
          "3" : "1"
        }
      }
   }
}

A new JSON object is added as part of the "facets" object in the context frame. When the default species was used, I put default instead.

MihaiSurdeanu commented 3 years ago

Can you please the information in this JSON block?

enoriega commented 3 years ago

@MihaiSurdeanu sorry, I didn't follow

MihaiSurdeanu commented 3 years ago

I don't understand the fields in the JSON example you're showing. Can you explain the new fields you added?

enoriega commented 3 years ago

The block as a whole is the context frame that the FRIES output generates with context information. What I added is the freqs dictionary. In it, the key represents a context type and the value is another dictionary where the key is the sentence distance and the value the # of times it appears that far away.

For example: cl:CL:0000233 appears once in the same sentence (key 0), once one sentence away and once two sentences away. There was no species context in the neighborhood of the mention, therefore the mention got assigned the default species in the paper: taxonomy:10090 and the value of the entry is "default" instead of the frequency counter.

At the end, for every context assigned to a mention, there is going to be one entry in the freqs dictionary

MihaiSurdeanu commented 3 years ago

Oh, I get it now. And the code looks good. merging next.