Open jiefei30 opened 11 months ago
it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT: 'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE: 'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16), REMARK VARCHAR2(512))
it's ok , but
CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16),
REMARK VARCHAR2(512))
it's not ok.
the only difference is \n
before REMARK
in the second sql
it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054 REMARK_COMMENT: 'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN); // https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052 PROMPT_MESSAGE: 'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !
it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054 REMARK_COMMENT: 'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN); // https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052 PROMPT_MESSAGE: 'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !
The change I made is unrelated to this problem. All I did was was to rename self.
to this.
for those two rules in order to put the grammar into "target agnostic format".
REMARK_COMMENT was added long before, first here: https://github.com/antlr/grammars-v4/commit/3f0150f57505dde0792739e79c0030a8c912e425#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324
A predicate was then added the same day here: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328
Then, it was changed again the day after to what it was until I changed it: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328
I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.
it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054 REMARK_COMMENT: 'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN); // https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052 PROMPT_MESSAGE: 'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !
The change I made is unrelated to this problem. All I did was was to rename
self.
tothis.
for those two rules in order to put the grammar into "target agnostic format".REMARK_COMMENT was added long before, first here: 3f0150f#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324
A predicate was then added the same day here: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328
Then, it was changed again the day after to what it was until I changed it: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328
I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.
@KvanTTT ok, I get it. Thanks for your reply. My current way is to temporarily remove these two lexical definitions
I think the problem here is that REMARK and PROMPT should be considered commands. They're not really comments. So, I think you're right, the rules should not be there in the lexer.
maybe only on python target ,because it's ok on java target
C:\antlr>grun PlSql sql_script -tree
select 1 from pro;
^Z
(sql_script (unit_statement (data_manipulation_language_statements (select_statement (select_only_statement (subquery (subquery_basic_elements (query_block select (selected_list (select_list_elements (expression (logical_expression (unary_logical_expression (multiset_expression (relational_expression (compound_expression (concatenation (model_expression (unary_expression (atom (constant (numeric 1)))))))))))))) (from_clause from (table_ref_list (table_ref (table_ref_aux (table_ref_aux_internal (dml_table_expression_clause (tableview_name (identifier (id_expression (regular_id pro))))))))))))))))) ; <EOF>)
C:\antlr>pygrun PlSql sql_script --tree
select 1 from pro;
^Z
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Scripts\pygrun.exe\__main__.py", line 7, in <module>
sys.exit(main())
~~~~^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\_pygrun.py", line 156, in main
process(input_stream, class_lexer, class_parser)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\_pygrun.py", line 124, in process
token_stream.fill()
~~~~~~~~~~~~~~~~~^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\BufferedTokenStream.py", line 301, in fill
while self.fetch(1000)==1000:
~~~~~~~~~~^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\BufferedTokenStream.py", line 124, in fetch
t = self.tokenSource.nextToken()
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\Lexer.py", line 137, in nextToken
ttype = self._interp.match(self._input, self._mode)
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 104, in match
return self.execATN(input, dfa.s0)
~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 173, in execATN
target = self.computeTargetState(input, s, t)
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 231, in computeTargetState
self.getReachableConfigSet(input, s.configs, reach, t)
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 280, in getReachableConfigSet
if self.closure(input, config, reach, currentAltReachedAcceptState, True, treatEofAsEpsilon):
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 359, in closure
currentAltReachedAcceptState = self.closure(input, c, configs, currentAltReachedAcceptState, speculative, treatEofAsEpsilon)
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 357, in closure
c = self.getEpsilonTarget(input, config, t, configs, speculative, treatEofAsEpsilon)
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 396, in getEpsilonTarget
if self.evaluatePredicate(input, t.ruleIndex, t.predIndex, speculative):
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 465, in evaluatePredicate
return self.recog.sempred(None, ruleIndex, predIndex)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\antlr\PlSqlLexer.py", line 16864, in sempred
return pred(localctx, predIndex)
File "C:\antlr\PlSqlLexer.py", line 16875, in PROMPT_MESSAGE_sempred
return this.IsNewlineAtPos(-4)
^^^^
NameError: name 'this' is not defined. Did you forget to import 'this'?
C:\antlr>
REM
,REMARK
,PRO
,PROMPT
These words cannot be a identifier. such asThis sql will be parsed error in Antlr. But it is a correct sql in Oracle. I checked the PlSqlLexer.g4 , However ,These words are not defined as keywords. So, What's going on this