fmarten / JoSimText

A system for word sense induction and disambiguation based on JoBimText approach
0 stars 0 forks source link

Support of multiword expressions and named entities #7

Open alexanderpanchenko opened 7 years ago

alexanderpanchenko commented 7 years ago

Motivation

The current implementation of lefex supports generation of features for both single and multiword terms while the current implementation https://github.com/uhh-lt/josimtext/blob/master/src/main/scala/de/uhh/lt/jst/dt/CoNLL2DepTermContext.scala only generates features for single words.

Implementation

The idea of generation of features for MWE is illustrated in the figure below:

image

Here, for the features of "mickey mouse" are all dependencies of "mickey" + all dependencies of "mouse" - dependencies between "mickey" and "mouse".

Here is an example of the named entity from our data "Lower Johnson".

image

It should be represented with the features "pobj(@,to)" (.i.e line number 18). note that "nn(@,Johnson)" (.i.e line number 17) is not a feature of this entity.

Another example:

image

Here the "New York City" should have the features: "prep_in(@,hotel)"

fmarten commented 7 years ago

Can you provide me the text from the screenshots above?

alexanderpanchenko commented 7 years ago

i will try to, but I do not remember exactly from which file these were taken (in principle ANY sentence with a B-Person AND I-Person will do)

On Thu, Aug 24, 2017 at 6:48 PM, Fide Marten notifications@github.com wrote:

Can you provide me the text from the screenshots above?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fmarten/JoSimText/issues/7#issuecomment-324692358, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vkMgPP795Cl9-08NC-OoZgEwfS2Rks5sballgaJpZM4O-8u8 .

alexanderpanchenko commented 7 years ago

here you are (the closest thing i was able to find...)

http://panchenko.me/data/joint/corpora/mwe-conll-sample.csv