clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

Context Engine Improvements #754

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

The context engine is now able to keep track of metadata of the context assignments. A new field on the Context trait called contextMetaData contains a map from context types to a map of frequencies by distance.

I.e. if a specific context type is mentioned once in the current sentence and twice on the previous sentence, the map will be Map(0 -> 1, 1 -> 2)

There is a unit test to make sure this functionality works appropriately.

MihaiSurdeanu commented 3 years ago

@enoriega : apologies for the delay. This is almost perfect. I would change Context.context and Context.contextMetaData from var to def in the trait. And implement them as val or var in the actual policy class. This way you don't have to initialize them to None in the trait, which is a bit ugly. Note that in Scala, you can switch from def in the superclass (or trait) to val/var in the subclass.

Also, just for counting occurrences, we have the Counter class, which maybe can be used? https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/struct/Counter.scala

@kwalcock : what do you think?

MihaiSurdeanu commented 3 years ago

Also, some tests are failing.

enoriega commented 3 years ago

@MihaiSurdeanu I noticed one test is not passing because, for some reason, we are expecting the context engine to assign only one context type per context class, e.g. it will only provide one species context and one organ context.

With the changes in this PR, now there may be more than one context type of the same class, e.g. human and mice being declared context for the same event.

This is breaking the test. If there is a good reason to just have one context type per class, then I will see how to adapt it, otherwise I will disable that test.

cl4yton commented 3 years ago

Just to pipe in on this @enoriega and @MihaiSurdeanu : I definitely think it should be possible to have multiple context instances of the same type (e.g., multiple species) associated with the same event.

enoriega commented 3 years ago

I agree with @cl4yton. Just want to be sure I won't be overriding something that was meant to be for a reason. Although I think this was most likely an arbitrary decision taken back in the day

MihaiSurdeanu commented 3 years ago

I agree. Multiple context instances are possible!

On Sun, Jun 6, 2021 at 03:52 Enrique Noriega @.***> wrote:

I agree with @cl4yton https://github.com/cl4yton. Just want to be sure I won't be overriding something that was meant to be for a reason. Although I think this was most likely an arbitrary decision taken back in the day

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/clulab/reach/pull/754#issuecomment-855317721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TRYOZWJVAM6BKKTPCDTRLBFRANCNFSM457K5PGA .

enoriega commented 3 years ago

@kwalcock Can you take a look to see everything here looks fine? (Mihai agrees with this petition)

Thanks!

kwalcock commented 3 years ago

I haven't forgotten. Hopefully tomorrow.

enoriega commented 3 years ago

Thanks!

kwalcock commented 3 years ago

His Majesty's Ship still wins in the frequency race against Harvard Medical School, so I'm imagining a boat called Context :-)

enoriega commented 3 years ago

Ahoy

kwalcock commented 3 years ago

I was having too much fun with Policies.scala so added some changes to a branch with a PR. All the code here looks plenty functional to me, so the PR is at your discretion, of course.

MihaiSurdeanu commented 3 years ago

Thanks @kwalcock and @enoriega!

I will merge this as soon as I get the confirmation that tests pass.

MihaiSurdeanu commented 3 years ago

@kwalcock : can you please release this version of Reach?

@enoriega: can you please email HMS with a summary of updates on: entity configuration in the lexicon, and grounding?