amir-zeldes / xrenner

eXternally configurable REference and Non Named Entity Recognizer
Other
18 stars 11 forks source link

Mechanism for date-time awareness #18

Open amir-zeldes opened 8 years ago

amir-zeldes commented 8 years ago

Some indexical expressions such as 'this year' or 'today' can be coreferent with an actual lexical NP in the text, especially in news texts where the article is dated. A mechanism should be devised to recognize likely indications of the text's current date-time in order to capture these.

As a proof-of-concept prototype we can try to find entire utterances that contain only a date/time. If a text contains a pattern matching one of the typical date/time patterns, some global variables should be set and modeled in a new object representing the entire document:

If these are not fully known, we can still specify some partial date/time information, which should always be available even if the above are known (as convenience functions):

When processing documents, common noun markables can be matched against configurable patterns (case insensitive), which map to certain document properties:

this year -> document.year

As soon as a suspect date-pattern is encountered, it will be added to the LexData object's coref.tab dictionary. The workflow is:

The list of patterns should be a semi-colon separated entry in the config.ini for the language, e.g.:

year_ref=this year;the current year
day_ref=today
amir-zeldes commented 8 years ago

A good test case would be in GUM_news_flag, since the second sentence is

Thursday , May 7, [2015]

And later we have:

... on Waitangi  Day  - February  6  - [this year]