antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.11k stars 3.28k forks source link

ANTLR not recognizing label assignment #4542

Closed clueless-skywatcher closed 7 months ago

clueless-skywatcher commented 7 months ago

Target Language: Java Antlr Version: 4.13.2 IDE: Visual Studio Code (Version 1.86.2) IDE Extension: ANTLR4 grammar syntax support (Version 2.4.6) Build System: Gradle

I am following a video to create a parser, and I need to parse strings and capture the parsed string in a variable. The rule for matching strings is as follows

STRING
    :   '"' 
        { StringBuilder b = new StringBuilder(); }
        (c=~('\n' | '\r' | '"') { b.appendCodePoint(c); })*
        '"'
        {setText(b.toString());}
    ;

At the equals sign (next to the "c" label in the next line after the StringBuilder initialization), ANTLR complains

syntax error: '=' came as a complete surprise to me while looking for lexer rule element

while no such issues popped up on the video.

Any help to resolve this issue and pointing out where I am going wrong would be highly appreciated. Is this due to a version mismatch since in the video they were using ANTLR 3.x?

jimidle commented 7 months ago

You need to use $ so that the code gen uses the correct reference code

$xxx

And so on.

On Sat, Feb 24, 2024 at 12:05 Epsilonator @.***> wrote:

Target Language: Java Antlr Version: 4.13.2 IDE: Visual Studio Code (Version 1.86.2) IDE Extension: ANTLR4 grammar syntax support (Version 2.4.6) Build System: Gradle

I am following a video to create a parser, and I need to parse strings and capture the parsed string in a variable. The rule for matching strings is as follows

STRING : '"' { StringBuilder b = new StringBuilder(); } (c=~('\n' | '\r' | '"') { b.appendCodePoint(c); })* '"' {setText(b.toString());} ;

At the equals sign (next to the c in the next line after the StringBuilder initialization), ANTLR complains

syntax error: '=' came as a complete surprise to me while looking for lexer rule element

while no such issues popped up on the video.

Any help to resolve this issue and pointing out where I am going wrong would be highly appreciated. Is this due to a version mismatch?

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMEH5UK6OBL6U6KV4ILYVITVNAVCNFSM6AAAAABDYFUTJSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2TENBSGY2TSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

clueless-skywatcher commented 7 months ago

@jimidle I fixed that just now and it's still throwing the error

STRING
    :   '"' 
        { StringBuilder b = new StringBuilder(); }
        (c=~('\n' | '\r' | '"') { $b.appendCodePoint(c); })*
        '"'
        {setText($b.toString());}
    ;

Where do I need to add the $ sign?

jimidle commented 7 months ago

Ah. I see what you are doing. Your atribgbuikder is out of scope. You need to declare it in the @decls {} section. Ignite the $ comment. That’s for labels in your grammar

On Sat, Feb 24, 2024 at 13:16 Epsilonator @.***> wrote:

@jimidle https://github.com/jimidle I fixed that just now and it's still throwing the error

STRING : '"' { StringBuilder b = new StringBuilder(); } (c=~('\n' | '\r' | '"') { $b.appendCodePoint(c); })* '"' {setText($b.toString());} ;

Where do I need to add the $ sign?

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4542#issuecomment-1962584956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMBBCTTTZ3KCO7PVQLDYVI37NAVCNFSM6AAAAABDYFUTJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGU4DIOJVGY . You are receiving this because you were mentioned.Message ID: @.***>

clueless-skywatcher commented 7 months ago

@jimidle The problem is not with the StringBuilder here. It's just that ANTLR refuses to acknowledge the "=" sign beside the "c". Declaring the StringBuilder in the decls section still didn't fix the issue.

clueless-skywatcher commented 7 months ago

Here is the full grammar for reference

grammar Lark;

options {
    language = Java;
}

@header {
import java.util.*;
}

@decls {
    StringBuilder idB = new StringBuilder();
}

prog
    : stmt+
    ;

stmt
    : expr ';'
    | term ';'
    | assign ';'
    | functionDef ';'
    | functionAnonDef ';'
    | functionCall ';'
    ;

functionCall
    : IDENTIFIER '(' actualParams? ')'
    ;

actualParams
    : expr (',' expr)*
    ;

term
    : IDENTIFIER
    | '(' expr ')'
    | INTEGER
    | DECIMAL
    | STRING
    | CHARACTER
    | IDENTIFIER '(' actualParams? ')'
    ;

assign
    : id=IDENTIFIER ':=' expr { 
        System.out.println($id.text); 
        System.out.println($expr.text);
    }
    ;

returnStmt: 'return' expr ';';

functionDef
    : '<' IDENTIFIER '>' ':=' '(' params? ')' '->' '{'(stmt | returnStmt)*'}'
    ;   // <Func> := (a, b, c) -> {
        //  DoThings();
        // }

functionAnonDef
    : '<' IDENTIFIER '>'
    ;

params
    : param (',' param)*
    ;

param
    : IDENTIFIER
    ;

negate
    : '~'* term
    ;

unary
    : ('+' | '-')* negate
    ;

exponent
    : unary ('^' unary)*
    ;

multiply
    : exponent (('*' | '/' | '%') exponent)*
    ;

add
    : multiply (('+' | '-') multiply)*
    ;

relation
    : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)*
    ;

expr
    : relation (('and' | 'or') relation)*
    ;

INTEGER: DIGIT+;
DECIMAL: DIGIT+ '.' DIGIT+;

STRING
    :   '"' 
        (c=~('\n' | '\r' | '"') { idB.appendCodePoint(c); })*
        '"'
        {setText(idB.toString());}
    ;
CHARACTER
    : '\'' . '\'' { setText(getText().substring(1, 2));  }
    ;
fragment LETTER: [a-zA-Z];
fragment DIGIT: [0-9];

// NEWLINE: '\n';

IDENTIFIER: LETTER (LETTER | DIGIT)* ;

WS: [ \t\n\r\f]+ -> channel(HIDDEN);
clueless-skywatcher commented 7 months ago

@jimidle I believe this is a bug, since replacing the RHS of c=~('\n' | '\r' | '"') with any other rule or character still throws the same issue, even after removing all the StringBuilder stuff. Will make a workaround in the visitor later on, but any help is still appreciated.

ericvergnaud commented 7 months ago

Have you tried adding $c instead of c?

kaby76 commented 7 months ago

STRING is a lexer symbol. You can't define an attribute (c = .....) in a lexer rule. (See https://github.com/antlr/grammars-v4/pull/3205.)

clueless-skywatcher commented 7 months ago

After tinkering around for a bit I made a workaround like this

STRING
    :   '"' 
        ~('\n' | '\r')*
        '"'
        { setText(getText().substring(1, getText().length() - 1)); }
    ;

and it currently is working fine for my usecase. Posting this here to help others when they face a similar issue.

jimidle commented 7 months ago

Because it supports many languages you cannot declare it and init it. Just declare it there and either use @init{} to declare it or init in code. Look for examples.

On Sat, Feb 24, 2024 at 13:26 Epsilonator @.***> wrote:

Here is the full grammar for reference

grammar Lark;

options { language = Java; }

@header { import java.util.*; }

@decls { StringBuilder idB = new StringBuilder(); }

prog : stmt+ ;

stmt : expr ';' | term ';' | assign ';' | functionDef ';' | functionAnonDef ';' | functionCall ';' ;

functionCall : IDENTIFIER '(' actualParams? ')' ;

actualParams : expr (',' expr)* ;

term : IDENTIFIER | '(' expr ')' | INTEGER | DECIMAL | STRING | CHARACTER | IDENTIFIER '(' actualParams? ')' ;

assign : id=IDENTIFIER ':=' expr { System.out.println($id.text); System.out.println($expr.text); } ;

returnStmt: 'return' expr ';';

functionDef : '<' IDENTIFIER '>' ':=' '(' params? ')' '->' '{'(stmt | returnStmt)*'}' ; // := (a, b, c) -> { // DoThings(); // }

functionAnonDef : '<' IDENTIFIER '>' ;

params : param (',' param)* ;

param : IDENTIFIER ;

negate : '~'* term ;

unary : ('+' | '-')* negate ;

exponent : unary ('^' unary)* ;

multiply : exponent (('' | '/' | '%') exponent) ;

add : multiply (('+' | '-') multiply)* ;

relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)* ;

expr : relation (('and' | 'or') relation)* ;

INTEGER: DIGIT+; DECIMAL: DIGIT+ '.' DIGIT+;

STRING : '"' (c=~('\n' | '\r' | '"') { idB.appendCodePoint(c); })* '"' {setText(idB.toString());} ; CHARACTER : '\'' . '\'' { setText(getText().substring(1, 2)); } ; fragment LETTER: [a-zA-Z]; fragment DIGIT: [0-9];

// NEWLINE: '\n';

IDENTIFIER: LETTER (LETTER | DIGIT)* ;

WS: [ \t\n\r\f]+ -> channel(HIDDEN);

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4542#issuecomment-1962606679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMHRD2H7PSWESCJD5OLYVI5IHAVCNFSM6AAAAABDYFUTJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGYYDMNRXHE . You are receiving this because you were mentioned.Message ID: @.***>

clueless-skywatcher commented 7 months ago

Closing this issue since my problem is solved.