cederberg / grammatica

Grammatica is a C# and Java parser generator (compiler compiler)
https://grammatica.percederberg.net
Other
87 stars 35 forks source link

Infinite loop bug #4

Open peske opened 7 years ago

peske commented 7 years ago

Hi again! After some testing I've discovered that for loop at https://github.com/cederberg/grammatica/blob/master/src/csharp/PerCederberg.Grammatica.Runtime/RecursiveDescentParser.cs#L213 executes infinitely in some cases. And here's some criticism for you: the thing you've done by increasing/decreasing loop counter within loop is pretty much anti-pattern. It's very, very hard to understand what was your intention there, and what has gone wrong.

cederberg commented 7 years ago

This shouldn't cause an infinite loop, as NextToken() is called inside the exception handler. This consumes 1 token from the input stream, eventually throwing an exception on EOF.

Could you please provide a test grammar file that triggers this bug?

Regarding code style, well... Parsers are not for the faint of heart. They are bound to contain some nasty stuff here and there, due to the nature of the problem they solve (trying to translate human text input into program language).

The code in question is part of the error-recovery. If an alternative didn't parse correctly, it is logged, one token is discarded and the same alternative is tried again. Using here i-- highlights that something fishy is being done, which you correctly noticed. So, in a way, the code works just as intended... ;-)

skeggib commented 5 years ago

Hi, I have a similar issue with this loop. Although the loop is not infinite, is some cases the number of recursive calls produce a StackOverflowException.

You can recreate the bug with the grammar and text to parse at the end of this message. I tried to reproduce the bug with a simpler grammar but did not succeed.

The bug is probably due by the fact that at Parser.cs:587 a ParseException is thrown, which is catched in the try/catch block of the ParseAlternative method (RecursiveDescentParser.cs:214) lower in the call stack, which will call NextToken at RecursiveDescentParser.cs:218 and so on. I suggest throwing another type of exception in the case of an unexpected end of file.

Grammar:

%header%
GRAMMARTYPE = "LL"

%tokens%

T_WHITESPACE                = <<[ \t\n\r]+>> %ignore%
T_EQUALS                    = "="
T_GT                        = ">"
T_LT                        = "<"
T_ADDITION                  = "+"
T_SUBSTRACTION              = "-"
T_DIVISION                  = "\frac"
T_OPENING_BRACE             = "{"
T_CLOSING_BRACE             = "}"
T_POWER                     = "^"
T_COMMA                     = ","
T_EQUIVALENT                = "\iff"
T_TILDE                     = "\tilde"
T_DELTA                     = "\delta"
T_CONSTANT                  = <<[0-9]*(\.[0-9]+|[0-9]+)>>
T_MULTIPLICATION            = <<(\*|\\times)>>
T_OPENING_PARENTHESIS       = <<(\(|\\left\()>>
T_CLOSING_PARENTHESIS       = <<(\)|\\right\))>>
T_IDENTIFIER                = <<(\\)?[a-zA-Z0-9]+'?>>
T_SUBSCRIPT                 = <<_(\\[a-zA-Z0-9]+|[a-zA-Z0-9]|\{[a-zA-Z0-9 \\]+\})>>

%productions%

Equivalence
    = Equality [ T_EQUIVALENT Equivalence ]
    ;

Equality
    = Expression [ ( T_EQUALS | T_GT | T_LT ) Equality ]
    ;

Expression
    = Term
    | T_SUBSTRACTION Expression
    | T_ADDITION Expression
    ;

Term
    = Factor [ ( T_ADDITION | T_SUBSTRACTION ) Term ]
    ;

Factor
    = Power [ [ T_MULTIPLICATION ] Factor ]
    ;

Power
    = Atom [ T_POWER Power ]
    ;

Atom
    = T_CONSTANT
    | Variable
    | T_OPENING_PARENTHESIS Expression T_CLOSING_PARENTHESIS
    | T_OPENING_BRACE Expression T_CLOSING_BRACE
    | T_DIVISION T_OPENING_BRACE Expression T_CLOSING_BRACE T_OPENING_BRACE Expression T_CLOSING_BRACE
    | Function
    ;

Variable
    = Identifier
    ;

Function
    = Identifier [ T_POWER T_CONSTANT ] FunctionParameters
    ;

FunctionParameters
    = T_OPENING_PARENTHESIS Expression ( T_COMMA Expression )* T_CLOSING_PARENTHESIS
    | T_OPENING_BRACE Expression ( T_COMMA Expression )* T_CLOSING_BRACE
    ;

Identifier
    = [ Modifier ] T_IDENTIFIER [ T_SUBSCRIPT ]
    | Modifier T_OPENING_BRACE T_IDENTIFIER [ T_SUBSCRIPT ] T_CLOSING_BRACE
    ;

Modifier
    = T_TILDE
    | T_DELTA
    ;

Text to parse:

\Text(29,11)[cb]{{\footnotesize $\bm{\mu^+}$}}
\Text(29,-12)[cb]{{\footnotesize $\bm{\mu^-}$}}
\hbox{
\begin{picture}(0,0)(0,0)
\Text(3,11)[cb]{{\footnotesize $\bm{e^+}$}}
\Text(3,-12)[cb]{{\footnotesize $\bm{e^-}$}}
\Text(29,11)[cb]{{\footnotesize $\bm{\mu^+}$}}
\Text(29,-12)[cb]{{\footnotesize $\bm{\mu^-}$}}
\Text(16,2)[cb]{{\footnotesize $\bm{Z}$}}
\end{picture}
\phantom{.................}
}