Open kno10 opened 7 years ago
I agree with the "second half" issue (some holds for quarters etc.)
Regarding the minute expressions, we keep on following TimeML, these are durations, which can be anchored. The value remains a duration. Independent of whether you agree or not: it is very unlikely that these expressions would be normalized correctly. You would require the start time of the match and further very specific knowledge, e.g., that 46 minute is not 1 minute later than 45 minute but about 16 minutes later...
But in the same text, 5 minutes later
was mapped differently, into UNDEF-REF-minute-PLUS-5
, because of the existing date_r20c
rule.
I agree that we usually won't be able to translate them into absolute time points; in particular with game time.
I never said that the annotation standard is perfect, just that we try to follow it. And that's how we interpreted it... But in general, it might be worth to think about "trying to anchor everything" independent of the TIMEX3 annotation standard. But this would probably have to result in a new standard with new kinds of problems.
Should then UNDEF-REF-minute-PLUS-5
be translated into PT5M
with modifier AFTER
?
Well, not following the annotation standard, things such as "later" (two days later) or "ago" (three weeks ago) are handled as part of the temporal expressions. using your examples:
--> type=duration; "on" is not part of the expression as it's a preposition preceding the temporal expression
It's due to different linguistic realizations, so I would not bet that everything is fully consistent - in particular not in the system's output, but probably not even in the annotation guidelines.
In particular, it occurs frequently in sports (and thus, in Wikipedia, news articles, books, ...).
E.g. https://en.wikipedia.org/wiki/1958_FIFA_World_Cup_Final
Here, second half will be interpreted as
1958-H2
. I suggest to disable ruledate_r10b
because of these false positives. Also, "after only 4 minutes", "on 32 minutes", and "10 minutes into" should probably be relative time references rather than durations.Also, e.g. in Wikipedia "Karl May":
Suggest rule updates for the time references: