antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.15k stars 3.28k forks source link

The syntax parsing time and performance did not meet my expectations #4683

Open cmmjxh opened 2 months ago

cmmjxh commented 2 months ago

Hello, I have customized the syntax using Antlr4 (4.13.2), but when using the parser to parse, I found that the general performance loss for each syntax parsing is about 50ms. Our project can only allow us to control it within about 5ms. I am not sure if it is because our syntax definition is inaccurate, and the performance supported by Antlr4 can only reach this limit. Can you help answer my question?(java) `grammar DataFusion;

@header{ package org.example.code; }

options { language = Java; }

// DataFusion 语法定义jql jql: elements end ;

elements: element (',' element)* ; // Simplified to allow for easier parsing of multiple elements

element: ID ':' (strings | constant | function | expr) // Simplified and removed unnecessary alternatives | all | ignore ;

expr: term (op=('+'|'-') term) | factor (op=(''|'/') factor)* | number | strings | function | '(' expr ')' ;

term: factor (op=('+'|'-') factor) *; factor: number | strings | function | '(' expr ')' | ID;

function: AGGOPER '(' argument ')' | IFNULL '(' number ',' strings ',' argument ')' | CONCAT '(' strings (',' strings)* ')' ;

argument: strings | number | expr | constant ;

number: INTEGER | FLOAT ;

strings : ID ;

end : ';';

constant: '\'' ID '\'' | '\'''\'' | '\'' (INTEGER | FLOAT) '\'';

all : '' | ID '' ;

ignore : '-$'ID | '-$'all;

AGGOPER: 'SUM' | 'sum' | 'AVG' | 'avg' | 'COUNT' | 'count' | 'MAX' | 'max' | 'MIN' | 'min' ;

IFNULL: 'IFNULL' | 'ifnull' ;

CONCAT: 'CONCAT' | 'concat';

FLOAT : '-'? DIGIT+ '.' DIGIT+ ; fragment DIGIT : [0-9] ; ID : [a-zA-Z0-9_$\u0080-\uffff.]+ ; Grammar_EOF: ';' ; WS: [ \t\r\n]+ -> skip;` 解析耗时:57ms

kaby76 commented 2 months ago

The grammar you provide is not valid. It doesn't pass the Antlr Tool. Please use correct Markdown syntax for code blocks.

jimidle commented 2 months ago

The grammar looks highly ambiguous and precedence looks incorrect. Use the recognizer options to report ambiguities s you parse then fix the grammar. Then you can use SLL mode and should teach your required parsing times easily. You have to work at a grammar and understand it to get good performance

On Mon, Aug 26, 2024 at 06:41 Ken Domino @.***> wrote:

The grammar you provide is not valid. It doesn't pass the Antlr Tool. Please use correct Markdown syntax for code blocks.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4683#issuecomment-2310118078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDIBPNTIDXVDFSNJE3ZTMO67AVCNFSM6AAAAABNDDRLRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGEYTQMBXHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>