Mechanism for date-time awareness

Some indexical expressions such as 'this year' or 'today' can be coreferent with an actual lexical NP in the text, especially in news texts where the article is dated. A mechanism should be devised to recognize likely indications of the text's current date-time in order to capture these.

As a proof-of-concept prototype we can try to find entire utterances that contain only a date/time. If a text contains a pattern matching one of the typical date/time patterns, some global variables should be set and modeled in a new object representing the entire document:

document.date
document.time

If these are not fully known, we can still specify some partial date/time information, which should always be available even if the above are known (as convenience functions):

document.weekday
document.year

When processing documents, common noun markables can be matched against configurable patterns (case insensitive), which map to certain document properties:

this year -> document.year

As soon as a suspect date-pattern is encountered, it will be added to the LexData object's coref.tab dictionary. The workflow is:

Encounter a sentence consisting solely of a date, based on some predefined set of date formats
Update document object properties (document.date)
Add entries to lex.coref to anticipate this year -> 2016 (once we know it's 2016)
Now normal matching should catch this year -> 2016

The list of patterns should be a semi-colon separated entry in the config.ini for the language, e.g.:

year_ref=this year;the current year
day_ref=today

amir-zeldes / xrenner

Mechanism for date-time awareness #18