Open Xeverous opened 5 years ago
The problem appears when the grammar is wrongly designed which causes it to be defective by design.
Those grammars are not defective, but just left recursive (as further noted by you), and Spirit being a recursive descent parser cannot parse such grammars https://en.wikipedia.org/wiki/Left_recursion#Accommodating_left_recursion_in_top-down_parsing.
Would it be possible to detect such "malformed" grammars at compile-time?
It is possible for context-free grammars, as also possible to rewrite them, though it will certainly break at least some usages.
Should Spirit have some guidance in grammar/language design on its documentation?
I do not think copying Wikipedia or any other source is a good idea, not only because of probable legal problems with such actions, but because there are different left recursion elimination algorithms and optimizations on them.
Spirit.Classic documentation has a record in FAQ about left recursion https://www.boost.org/doc/libs/master/libs/spirit/classic/doc/faq.html#left_recursion, I am not sure about its usefulness, but you can improve it.
Spirit.Classic documentation has a record in FAQ about left recursion https://www.boost.org/doc/libs/master/libs/spirit/classic/doc/faq.html#left_recursion, I am not sure about its usefulness, but you can improve it.
I think it is useful, it showcases examples how to deal with common left recursion problems. Could be updated and added to X3 docs.
I welcome any PR for improving the docs.
I have encountered the same problem multiple times.
The problem appears when the grammar is wrongly designed which causes it to be defective by design. One such example is this:
The grammar for my
array_expression
accepts something like[0, 1, 2, 3]
. Subscript will accept any expression followed by bracket-enclosed expression likefoo[0]
,func()[0]
,[0, 1, 2, 3][i]
.But this grammar does not actually work. When the parser encounters
foo[0]
, it does this:Obviously this is a bug in the design/implementation which can be fixed by different order of alternatives but I'm interested if:
x | (x >> y >> z)
withx >> -(y >> z)
, avoid left recursion that does not consume any characters etc