antlr / antlrworks

AntlrWorks tool for ANTLR
http://www.antlr.org/works
97 stars 45 forks source link

ANTLRWorks does not work well with chars from extended ASCII table #10

Closed imochurad closed 11 years ago

imochurad commented 11 years ago

I'm using ANTLRWorks to test a grammar I came up with and one of the rules foresees usage of BULLET symbol •, but when parse tree is being built it escapes it every time. I also tried other chars from extended ASCII table and they are omitted as well. It looks like ANTLRWorks issue.

Please see this thread for more details: http://stackoverflow.com/questions/17073851/antlr3-does-not-match-extended-ascii-characters

sharwell commented 11 years ago

Can you please add the following information so I can test this problem and find a solution?

  1. Steps to reproduce the problem you are seeing
  2. The behavior you expected to observe
  3. The behavior you actually observed
imochurad commented 11 years ago

this is the grammar I use:

grammar TestExpr;

options {
output=AST;
    ASTLabelType=CommonTree;
}

expr
    :   andExpr ('OR' andExpr)* EOF;
andExpr :   notExpr('AND'|','|'+' notExpr)*;
notExpr :   ('NOT')? kpp;
kpp :   keyword|phrase|proximity|'(' expr ')';
keyword
    :   CHAR;
phrase
    :   '"' keyword (PHRASE_SEPARATOR keyword)* '"';
proximity
    :   phrase '~' INT;
CHAR    :   ('A'..'Z') | ('a'..'z')+;
INT :   '0'..'9'+;
NEWLINE :   '\r'? '\n';
PHRASE_SEPARATOR    :   '\u2022';
WS  :   (' '|'\t'|'\n'|'\r')+ {skip();};

this is input string:

"xyq•we"

What i see is "xyqwe"<EOF> in Input window after clicking on 'Go To End' and NoViableAltException in ParseTree window.

Expected result must be token phrase.

sharwell commented 11 years ago

The problem was inconsistent handling of the input encoding in ANTLRWorks.

image