Some indexical expressions such as 'this year' or 'today' can be coreferent with an actual lexical NP in the text, especially in news texts where the article is dated. A mechanism should be devised to recognize likely indications of the text's current date-time in order to capture these.
As a proof-of-concept prototype we can try to find entire utterances that contain only a date/time. If a text contains a pattern matching one of the typical date/time patterns, some global variables should be set and modeled in a new object representing the entire document:
document.date
document.time
If these are not fully known, we can still specify some partial date/time information, which should always be available even if the above are known (as convenience functions):
document.weekday
document.year
When processing documents, common noun markables can be matched against configurable patterns (case insensitive), which map to certain document properties:
this year -> document.year
As soon as a suspect date-pattern is encountered, it will be added to the LexData object's coref.tab dictionary. The workflow is:
Encounter a sentence consisting solely of a date, based on some predefined set of date formats
Update document object properties (document.date)
Add entries to lex.coref to anticipate this year -> 2016 (once we know it's 2016)
Now normal matching should catch this year -> 2016
The list of patterns should be a semi-colon separated entry in the config.ini for the language, e.g.:
Some indexical expressions such as 'this year' or 'today' can be coreferent with an actual lexical NP in the text, especially in news texts where the article is dated. A mechanism should be devised to recognize likely indications of the text's current date-time in order to capture these.
As a proof-of-concept prototype we can try to find entire utterances that contain only a date/time. If a text contains a pattern matching one of the typical date/time patterns, some global variables should be set and modeled in a new object representing the entire document:
If these are not fully known, we can still specify some partial date/time information, which should always be available even if the above are known (as convenience functions):
When processing documents, common noun markables can be matched against configurable patterns (case insensitive), which map to certain document properties:
this year -> document.year
As soon as a suspect date-pattern is encountered, it will be added to the LexData object's coref.tab dictionary. The workflow is:
The list of patterns should be a semi-colon separated entry in the config.ini for the language, e.g.: