My TODO list, and now this issue tracker, are becoming cluttered with bugs and missing functionality in the datetime parsers, and I'm beginning to wonder if we'd be better-served rewriting them to recurse over anchors.
That is, right now, it tries to parse input strings a word at a time. This catches most cases in English, but I've just found a not-so-edge case: "day after tomorrow" is great, but "three days after tomorrow" returns... tomorrow.
"December 3rd at 3pm" returns correctly. So does "3pm on December 3rd." However, "December 3rd at three pm" returns dec 3 with a remainder of "three pm." "December third" throws that formatting error.
The parser is turning into whack-a-case.
I'd like to open a discussion about recursively pulling keywords and offsets from the input string. For example, if the string contains "tomorrow," the rest of our parsing is relative to tomorrow's date. We divide the string at that word, and then check the strings before and after for more relative information.
"three o'clock tomorrow afternoon and again at five the next morning" --->
["three o'clock", tomorrow_date, "and again at five the next morning"]
"three o'clock" + tomorrow_date --->
[tomorrow_datetime(3am), " afternoon and again at five the next morning"]
[tomorrow_datetime(3am), "afternoon", "and again at five the next morning"]
tomorrow_datetime(3am) + "afternoon" --->
[tomorrow_datetime(3pm), "and again at five the next morning"]
In the above pseudocode, the hardest bit would be recognizing that the remainder, though it contains datetime-related information, is another datetime, and should not be handled by extract_datetime(), but rather by whatever is calling the function. In our case, that would likely be an extract_datetimes() function.
If we can nail down logic like that, however, that continually digests the input, rather than going word by word, it should be easier to code around edge cases, with an end goal being
[offset: ["three days after", 3pm], entity: [scheduled_event: "the superbowl"]] ---> datetime(three days after superbowl sunday at 3pm)
My TODO list, and now this issue tracker, are becoming cluttered with bugs and missing functionality in the datetime parsers, and I'm beginning to wonder if we'd be better-served rewriting them to recurse over anchors.
That is, right now, it tries to parse input strings a word at a time. This catches most cases in English, but I've just found a not-so-edge case: "day after tomorrow" is great, but "three days after tomorrow" returns... tomorrow.
"December 3rd at 3pm" returns correctly. So does "3pm on December 3rd." However, "December 3rd at three pm" returns dec 3 with a remainder of "three pm." "December third" throws that formatting error.
The parser is turning into whack-a-case.
I'd like to open a discussion about recursively pulling keywords and offsets from the input string. For example, if the string contains "tomorrow," the rest of our parsing is relative to tomorrow's date. We divide the string at that word, and then check the strings before and after for more relative information.
In the above pseudocode, the hardest bit would be recognizing that the remainder, though it contains datetime-related information, is another datetime, and should not be handled by extract_datetime(), but rather by whatever is calling the function. In our case, that would likely be an extract_datetimes() function.
If we can nail down logic like that, however, that continually digests the input, rather than going word by word, it should be easier to code around edge cases, with an end goal being
[offset: ["three days after", 3pm], entity: [scheduled_event: "the superbowl"]]
--->datetime(three days after superbowl sunday at 3pm)