cwbaker / lalr

LALR(1) parser for C++
MIT License
78 stars 11 forks source link

Trouble parsing/lexing 'errors' #49

Open mingodad opened 1 year ago

mingodad commented 1 year ago

Converting this grammar https://github.com/youtube/cobalt/blob/main/cobalt/css_parser/grammar.y I found that lalr has trouble parsing/lexing the identifier errors.

error_bug {

%whitespace "[ \t\r\n]*";
%whitespace "//[^\n\r]*";
//%whitespace "/\*[^*]+\*/";
%whitespace "/\*:C_MultilineComment:";

errors :
    error
    | errors error
    ;

}

Output:

lalr (10:0): ERROR: undefined symbol 's'
Error compiling grammar. Error count = 1
mingodad commented 1 year ago

This seems to fix the problem:

bool GrammarParser::match_error()
{
    const char *saved_position = position_;
    bool result = match( "error" );
    //check for fully word match
    if(result && position_ != end_ && (isalnum(*position_) || isdigit(*position_) || *position_ == '_'))
    {
        position_ = saved_position;
        return false;
    }

    return result;
}
mingodad commented 1 year ago

I ended up with this fix https://github.com/mingodad/lalr/commit/208eda60fecf46eaced41ef62731d8b977478043

mingodad commented 1 year ago

It seems that a similar problem exists in the lexer too:

lex_bug {
    %whitespace "[ \t\n\r]*";

    goal: function | id;

    function : 'function';
    id : "[a-zA-Z][a-zA-Z0-9]*";
}

Input:

function_exists

Output of dumping the lexer:

=line:column:type:index:identifier:lexeme:value
1:1:1:5:[function]:[function]:[function]
lalr (1:9): ERROR: Lexical error on character '_' (95)
1:9:-1:-1:[]:[]:[]
1:10:1:6:[id]:[[a-zA-Z][a-zA-Z0-9]*]:[exists]
=line:column:type:index:identifier:lexeme:value
1:1:1:5:[function]:[function]:[function]
lalr (1:9): ERROR: Lexical error on character '_' (95)
1:9:-1:-1:[]:[]:[]
1:10:1:6:[id]:[[a-zA-Z][a-zA-Z0-9]*]:[exists]

Output parser:

lalr (1:9): ERROR: Lexical error on character '_' (95)
lalr (1:9): ERROR: Syntax error on '' when expecting dot_end
lalr (1:9): ERROR: Lexical error on character '_' (95)
lalr (1:9): ERROR: Syntax error on '' when expecting dot_end