antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17k stars 3.26k forks source link

Is ANTLR 4.7's exception handling method different from ANTLR 4.5? #1967

Open sunpxyz opened 7 years ago

sunpxyz commented 7 years ago

_Recently, I take apart in a project that uses ANTLR-4 to build DSL syntax maker. The previous code was built by ANTLR-4.5, and I wanna update current version to latest 4.7.

This is my g4 code name Search.g4: ---start---------------------------------------------------- grammar Search;

LPAREN : '('; RPAREN : ')'; STAR : '*'; COMMA : ',';

COUNT : 'COUNT'; MAX : 'MAX'; MIN : 'MIN'; AVG : 'AVG'; SUM : 'SUM'; STD : 'STD'; VAR : 'VAR'; SUM2 : 'SUM2';

GROUPBY:'GROUPBY'; ORDERBY:'ORDERBY'; LIMIT:'LIMIT';

NUM: [1-9][0-9]*; ID : [a-zA-Z0-9一-龥_]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

start : '| STAT' simple_func_expr+ (group_by_expr order_by_expr? limit_expr?)?;

simple_func_expr : (COUNT | MAX | MIN | AVG | SUM | STD | VAR | SUM2) LPAREN ID RPAREN | COUNT LPAREN STAR RPAREN;

group_by_expr : GROUPBY LPAREN ID (COMMA ID)* RPAREN;

order_by_expr : ORDERBY LPAREN (ID|simple_func_expr) (COMMA (ID|simple_func_expr))* RPAREN;

limit_expr : LIMIT NUM; // ---end----------------------------------------------------------------------

After I got C++ target API, I use NetBeans to build antlr project, what's more, custom my owner ErrorListener class. The project directory looks like:

---ProjectName ------syntax/ ---------SearchBaseListener.cpp ---------SearchBaseListener.h ---------SearchLexer.cpp ---------SearchLexer.h ---------SearchListener.cpp ---------SearchListener. ---------SearchParser.cpp ---------SearchParser.h ------CMakeLists.txt ------main.cpp

Then I construct two development environment, and build my code on both ANTLR4.5 and ANTLR4.7.

This code runs on ANTLR4.7 runtime:

// using ANTLR 4.7

include "antlr4-runtime.h"

include "syntax/SearchLexer.h"

include "syntax/SearchParser.h"

include "syntax/SearchBaseListener.h"

include

include

include

using namespace antlr4; void parseAggsStatement(const std::string& instr, std::vector& vstr);

class MyErrorListener : public BaseErrorListener { public: void syntaxError(Recognizer recognizer, Token offendingSymbol, size_t line, size_t charPositionInLine, const std::string &msg, std::exception_ptr e) { std::stringstream ss; ss << "aggregation syntax error!!\n" << " line: " << line << " position: " << charPositionInLine << " info: " << msg << std::endl; throw std::runtime_error(ss.str()); } };

int main() { // input string for test std::string instr = "| STAT COUNT(*) MAX(A) MIN(B) rGROUPBY(X,Y) rORDERBY(A,SUM(C),MAX(E)) LIMIT 10"; std::vector vstr;

try {
    parseAggsStatement(instr, vstr);
} catch(const std::exception& e) {
    std::cout << e.what() << std::endl;
    return 0;
}
// print result
for (auto& e : vstr) {
    std::cout << e << std::endl;
}

return 0;

}

void parseAggsStatement(const std::string& instr, std::vector& vstr) { try { ANTLRInputStream input(instr);
SearchLexer lexer(&input);

    MyErrorListener error_listener;
    lexer.removeErrorListeners();
    lexer.addErrorListener(&error_listener);

    CommonTokenStream tokens(&lexer);    
    tokens.fill();

    SearchParser parser(&tokens);    
    parser.removeParseListeners();
    parser.removeErrorListeners();
    parser.addErrorListener(&error_listener);

    tree::ParseTree* tree = parser.start();
    std::cout << tree->toStringTree() << std::endl;

    SearchBaseListener listener;
    tree::ParseTreeWalker::DEFAULT.walk(&listener, tree);

    vstr = listener.getListenContext();
} catch (const std::exception& ex) {
    throw std::runtime_error(ex.what());
}

}

RESULT: | STAT COUNT(*) MAX(A) MIN(B)

This code runs on ANTLR4.5 runtime:

// using ANTLR 4.5

include

include

include

include "syntax/SearchLexer.h"

include "syntax/SearchParser.h"

include "syntax/SearchBaseListener.h"

using namespace antlr4; void separateCommandSegment(const std::string& instr, std::vector& outvstr);

class MyErrorListener : public BaseErrorListener { void syntaxError(IRecognizer recognizer, Token offendingSymbol, size_t line, int charPositionInLine, const std::string& msg, std::exception_ptr e) { std::stringstream ss;
ss << "aggregation syntax error!!\n" << " line: " << line << " position: " << charPositionInLine << " info: " << msg << std::endl; throw std::runtime_error(ss.str()); } };

int main() { // input string for test std::string instr = "| STAT COUNT(*) MAX(A) MIN(B) rGROUPBY(X,Y) rORDERBY(A,SUM(C),MAX(E)) LIMIT 10";

std::vector<std::string> result;
try {
    separateCommandSegment(instr, result);
} catch (std::exception& e) {
    std::cout << e.what() << std::endl;
    return 0;
}
// print result
for (auto it = result.begin(); it != result.end(); ++it) {
    std::cout << *it << std::endl;
}

return 0;

}

void separateCommandSegment(const std::string& instr, std::vector& outvstr) { ANTLRInputStream input(instr);

MyErrorListener mylistener;
SearchLexer lexer(&input);
lexer.removeErrorListeners();
lexer.addErrorListener(&mylistener);

CommonTokenStream tokens(&lexer);
tokens.fill();

SearchParser parser(&tokens);
parser.removeErrorListeners();
parser.addErrorListener(&mylistener);

try {
    Ref<tree::ParseTree> tree = parser.start();
    SearchBaseListener listener;
    tree::ParseTreeWalker walker;
    walker.walk(&listener, tree.get());

    outvstr = listener.getListenerContent();
} catch (const std::exception& e) {
    throw std::runtime_error(e.what());
}

}

RESULT: aggregation syntax error!! line: 1 position: 30 info: extraneous input 'rGROUPBY' expecting '{, 'COUNT', 'MAX', 'MIN', 'AVG', 'SUM', 'STD', 'VAR', 'SUM2', 'GROUPBY'}'

API Contrast: The class BaseErrorListener's member syntaxError() function is different: // ANTLR4.7 void syntaxError(Recognizer recognizer, Token offendingSymbol, size_t line, size_t charPositionInLine, const std::string &msg, std::exception_ptr e); // ANTLR4.5 void syntaxError(IRecognizer recognizer, Token offendingSymbol, size_t line, int charPositionInLine, const std::string& msg, std::exception_ptr e);

So, I got different result: ANTLR4.7 always reports first syntax error and ignore later syntax and output partial correct result. But ANTLR4.5 always reports first syntax error and refuses to print any result.

Is ANTLR4.7's exception handling method different from ANTLR4.5? Or my way to use ANTLR is wrong? ASK FOR HELP!_

ericvergnaud commented 7 years ago

Hi The space for support is the google discussion group

Envoyé de mon iPhone

Le 25 juil. 2017 à 14:09, PayneSun notifications@github.com a écrit :

_Recently, I take apart in a project that uses ANTLR-4 to build DSL syntax maker. The previous code was built by ANTLR-4.5, and I wanna update current version to latest 4.7.

This is my g4 code name Search.g4: ---start---------------------------------------------------- grammar Search;

LPAREN : '('; RPAREN : ')'; STAR : '*'; COMMA : ',';

COUNT : 'COUNT'; MAX : 'MAX'; MIN : 'MIN'; AVG : 'AVG'; SUM : 'SUM'; STD : 'STD'; VAR : 'VAR'; SUM2 : 'SUM2';

GROUPBY:'GROUPBY'; ORDERBY:'ORDERBY'; LIMIT:'LIMIT';

NUM: [1-9][0-9]*; ID : [a-zA-Z0-9一-龥_]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

start : '| STAT' simple_func_expr+ (group_by_expr order_by_expr? limit_expr?)?;

simple_func_expr : (COUNT | MAX | MIN | AVG | SUM | STD | VAR | SUM2) LPAREN ID RPAREN | COUNT LPAREN STAR RPAREN;

group_by_expr : GROUPBY LPAREN ID (COMMA ID)* RPAREN;

order_by_expr : ORDERBY LPAREN (ID|simple_func_expr) (COMMA (ID|simple_func_expr))* RPAREN;

limit_expr : LIMIT NUM; // ---end----------------------------------------------------------------------

After I got C++ target API, I use NetBeans to build antlr project, what's more, custom my owner ErrorListener class. The project directory looks like:

---ProjectName ------syntax/ ---------SearchBaseListener.cpp ---------SearchBaseListener.h ---------SearchLexer.cpp ---------SearchLexer.h ---------SearchListener.cpp ---------SearchListener. ---------SearchParser.cpp ---------SearchParser.h ------CMakeLists.txt ------main.cpp

Then I construct two development environment, and build my code on both ANTLR4.5 and ANTLR4.7.

This code runs on ANTLR4.7 runtime:

// using ANTLR 4.7

include "antlr4-runtime.h"

include "syntax/SearchLexer.h"

include "syntax/SearchParser.h"

include "syntax/SearchBaseListener.h"

include

include

include

using namespace antlr4; void parseAggsStatement(const std::string& instr, std::vectorstd::string& vstr);

class MyErrorListener : public BaseErrorListener { public: void syntaxError(Recognizer recognizer, Token offendingSymbol, size_t line, size_t charPositionInLine, const std::string &msg, std::exception_ptr e) { std::stringstream ss; ss << "aggregation syntax error!!\n" << " line: " << line << " position: " << charPositionInLine << " info: " << msg << std::endl; throw std::runtime_error(ss.str()); } };

int main() { // input string for test std::string instr = "| STAT COUNT(*) MAX(A) MIN(B) rGROUPBY(X,Y) rORDERBY(A,SUM(C),MAX(E)) LIMIT 10"; std::vectorstd::string vstr;

try { parseAggsStatement(instr, vstr); } catch(const std::exception& e) { std::cout << e.what() << std::endl; return 0; } // print result for (auto& e : vstr) { std::cout << e << std::endl; }

return 0; }

void parseAggsStatement(const std::string& instr, std::vectorstd::string& vstr) { try { ANTLRInputStream input(instr); SearchLexer lexer(&input);

MyErrorListener error_listener;
lexer.removeErrorListeners();
lexer.addErrorListener(&error_listener);

CommonTokenStream tokens(&lexer);    
tokens.fill();

SearchParser parser(&tokens);    
parser.removeParseListeners();
parser.removeErrorListeners();
parser.addErrorListener(&error_listener);

tree::ParseTree* tree = parser.start();
std::cout << tree->toStringTree() << std::endl;

SearchBaseListener listener;
tree::ParseTreeWalker::DEFAULT.walk(&listener, tree);

vstr = listener.getListenContext();

} catch (const std::exception& ex) { throw std::runtime_error(ex.what()); } }

RESULT: | STAT COUNT(*) MAX(A) MIN(B)

This code runs on ANTLR4.5 runtime:

// using ANTLR 4.5

include

include

include

include "syntax/SearchLexer.h"

include "syntax/SearchParser.h"

include "syntax/SearchBaseListener.h"

using namespace antlr4; void separateCommandSegment(const std::string& instr, std::vectorstd::string& outvstr);

class MyErrorListener : public BaseErrorListener { void syntaxError(IRecognizer recognizer, Token offendingSymbol, size_t line, int charPositionInLine, const std::string& msg, std::exception_ptr e) { std::stringstream ss; ss << "aggregation syntax error!!\n" << " line: " << line << " position: " << charPositionInLine << " info: " << msg << std::endl; throw std::runtime_error(ss.str()); } };

int main() { // input string for test std::string instr = "| STAT COUNT(*) MAX(A) MIN(B) rGROUPBY(X,Y) rORDERBY(A,SUM(C),MAX(E)) LIMIT 10";

std::vector result; try { separateCommandSegment(instr, result); } catch (std::exception& e) { std::cout << e.what() << std::endl; return 0; } // print result for (auto it = result.begin(); it != result.end(); ++it) { std::cout << *it << std::endl; }

return 0; }

void separateCommandSegment(const std::string& instr, std::vectorstd::string& outvstr) { ANTLRInputStream input(instr);

MyErrorListener mylistener; SearchLexer lexer(&input); lexer.removeErrorListeners(); lexer.addErrorListener(&mylistener);

CommonTokenStream tokens(&lexer); tokens.fill();

SearchParser parser(&tokens); parser.removeErrorListeners(); parser.addErrorListener(&mylistener);

try { Ref tree = parser.start(); SearchBaseListener listener; tree::ParseTreeWalker walker; walker.walk(&listener, tree.get());

outvstr = listener.getListenerContent();

} catch (const std::exception& e) { throw std::runtime_error(e.what()); } }

RESULT: aggregation syntax error!! line: 1 position: 30 info: extraneous input 'rGROUPBY' expecting '{, 'COUNT', 'MAX', 'MIN', 'AVG', 'SUM', 'STD', 'VAR', 'SUM2', 'GROUPBY'}'

API Contrast: The class BaseErrorListener's member syntaxError() function is different: // ANTLR4.7 void syntaxError(Recognizer recognizer, Token offendingSymbol, size_t line, size_t charPositionInLine, const std::string &msg, std::exception_ptr e); // ANTLR4.5 void syntaxError(IRecognizer recognizer, Token offendingSymbol, size_t line, int charPositionInLine, const std::string& msg, std::exception_ptr e);

So, I got different result: ANTLR4.7 always reports first syntax error and ignore later syntax and output partial correct result. But ANTLR4.5 always reports first syntax error and refuses to print any result.

Is ANTLR4.7's exception handling method different from ANTLR4.5? Or my way to use ANTLR is wrong? ASK FOR HELP!_

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

marcospassos commented 7 years ago

@ericvergnaud I believe it is not a support issue. I can confirm that Antlr 4.7 introduced BC breaks. We just updated from 4.6 to 4.7 and 300+ unit tests are now broken.

@parrt do you have any guess of what happened? We have hundreds of error cases that were "missing token" in 4.6 and become "mismatched input" in 4.7.

sharwell commented 7 years ago

This sounds like a duplicate of #1922.