Closed st-cheewah closed 6 months ago
The most likely explanation is that there is something about your grammar that creates an exponential explosion of possible parses. When there's a correct solution, it finds it quickly, but when there is an error, it is forced to backtrack through the enormous number of alternative possible parses to prove there is no valid parse.
If you're not seeing multiple parse results on your test cases, then the explosion of possibilities is likely occurring in a part of your grammar that is hidden from the final parse. If that is the case, my guess would be that the problem is with your whitespace parser. Most likely, your whitespace definition allows any region of whitespace to potentially be chopped up in a variety of different ways. And in an error situation, all of those possibilities must be tried. When your text is long enough to have a few different whitespace areas, each with multiple possibilities, you get that exponential explosion.
So, my suggestion is to do some more testing on void
in isolation, non-hidden, on various combinations of whitespace, and see how many possible parses it generates. If you can find a way to write your whitespace parser as a single regex, that may alleviate the problem.
Sorry I don't have more time right now to directly help troubleshoot this for you, but hopefully I'm pointing you in the right direction to solve it. Good luck!
Ah I see it could be a side effect of my inefficient grammar. I will investigate further and update the ticket once I have more info. Thanks for the detailed explanation and pointers on how to troubleshoot, much appreciated!
PS: tried parsing whitespace with void = #'\\s+'
which didn't help; will be trying other ideas
When parsing the following protocol buffer text, the parser keeps consuming 100% cpu until OutOfMemoryError.
If
int32 a = .... ;
(total 4 lines) is removed, then the parser returns syntax error for line containingmsg = { abc: 123.4 },
, which is the expected behavior.If
msg = { abc: 123.4 },
is removed then the parser will succeed, which verifies thatint32 a = .... ;
is valid syntax.Hence it appears that the earlier text
int32 a = .... ;
is a pre-condition that somehow caused the parser to loop indefinitely on a syntax error that can otherwise be detected.More info: The parser uses the following ebnf:
with the following used for
:auto-whitespace