fluffle / sp0rkle

sp0rkle is dead, long live sp0rkle
irc://irc.pl0rt.org/#ed
8 stars 4 forks source link

The date string "2nd october 10am" doesn't parse. #46

Closed fluffle closed 9 years ago

gundalow commented 9 years ago

Thanks

fluffle commented 9 years ago

[20:36] sloop: interestingly 2nd october at 10am works fine [20:39] this is the downside of a simple lalr parser with such an ambiguous grammer [20:39] grammar, even [20:41] aha got it, it's a clash between: [20:42] 1) T_INTEGER o_dayqual o_of T_MONTHNAME // 2(nd) (of) October [20:42] 2) T_INTEGER o_dayqual o_of T_MONTHNAME o_comma T_INTEGER // 2(nd) (of) October(,) 2015 [20:43] and more explicitly it gets confused because it is seeing 10am and not expecting the am because it's interpreting the integer as the year

So, yeah. 2nd October 10am breaks down into the following tokens:

2 T_INTEGER "nd" T_DAYQUAL "October" T_MONTHNAME 10 T_INTEGER "A" "A" "M" "M" // M is so ambiguous I have to special case it basically everywhere.

This means the leftmost-longest match is (2) not (1), which then fails because it leaves just "AM" left on the token stack and that can't be parsed correctly. Putting the "at" in there causes (1) to be the longest match since it inserts a T_IGNORE after the T_MONTHNAME, and thus everything parses fine.

I could make the comma non optional to disambiguate better but I think that "2nd October 2015" is likely to be more common than "2nd October 10am". Realistically, LALR parsing is bad for this kind of grammar. It's possible that someone will have built a GLR parser in Go sometime in the last 4 years or so, maybe I'll investigate sometime. Otherwise, meh. Suggestions for alternative ways to disambiguate on a postcard, please :-)