Closed imochurad closed 11 years ago
Can you please add the following information so I can test this problem and find a solution?
this is the grammar I use:
grammar TestExpr;
options {
output=AST;
ASTLabelType=CommonTree;
}
expr
: andExpr ('OR' andExpr)* EOF;
andExpr : notExpr('AND'|','|'+' notExpr)*;
notExpr : ('NOT')? kpp;
kpp : keyword|phrase|proximity|'(' expr ')';
keyword
: CHAR;
phrase
: '"' keyword (PHRASE_SEPARATOR keyword)* '"';
proximity
: phrase '~' INT;
CHAR : ('A'..'Z') | ('a'..'z')+;
INT : '0'..'9'+;
NEWLINE : '\r'? '\n';
PHRASE_SEPARATOR : '\u2022';
WS : (' '|'\t'|'\n'|'\r')+ {skip();};
this is input string:
"xyq•we"
What i see is "xyqwe"<EOF>
in Input window after clicking on 'Go To End'
and NoViableAltException in ParseTree window.
Expected result must be token phrase.
The problem was inconsistent handling of the input encoding in ANTLRWorks.
I'm using ANTLRWorks to test a grammar I came up with and one of the rules foresees usage of BULLET symbol •, but when parse tree is being built it escapes it every time. I also tried other chars from extended ASCII table and they are omitted as well. It looks like ANTLRWorks issue.
Please see this thread for more details: http://stackoverflow.com/questions/17073851/antlr3-does-not-match-extended-ascii-characters