[PlSql] "REM", "REMARK", "PRO", "PROMPT" can not be a identifier

jiefei30 commented 11 months ago

REM, REMARK, PRO, PROMPT These words cannot be a identifier. such as

SELECT REMARK FROM T1

This sql will be parsed error in Antlr. But it is a correct sql in Oracle. I checked the PlSqlLexer.g4 , However ,These words are not defined as keywords. So, What's going on this

jiefei30 commented 11 months ago

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

jiefei30 commented 11 months ago

CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16), REMARK VARCHAR2(512))

it's ok , but

CREATE TABLE AGMT (LIMIT_FLAG VARCHAR2(16), 
REMARK VARCHAR2(512))

it's not ok. the only difference is \n before REMARK in the second sql

jiefei30 commented 11 months ago

it seems like this code caused :

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;

line 2450 in PlSqlLexer.g4

@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !

kaby76 commented 11 months ago

it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !

The change I made is unrelated to this problem. All I did was was to rename self. to this. for those two rules in order to put the grammar into "target agnostic format".

REMARK_COMMENT was added long before, first here: https://github.com/antlr/grammars-v4/commit/3f0150f57505dde0792739e79c0030a8c912e425#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324

A predicate was then added the same day here: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

Then, it was changed again the day after to what it was until I changed it: https://github.com/antlr/grammars-v4/commit/356f3ea19e3c62fa92e1f3c7997daa8ec7711ad9#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.

jiefei30 commented 11 months ago

it seems like this code caused :
// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve034.htm#SQPUG054
REMARK_COMMENT:      'REM' {this.IsNewlineAtPos(-4)}? 'ARK'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF -> channel(HIDDEN);

// https://docs.oracle.com/cd/E11882_01/server.112/e16604/ch_twelve032.htm#SQPUG052
PROMPT_MESSAGE:      'PRO' {this.IsNewlineAtPos(-4)}? 'MPT'? (' ' ~('\r' | '\n')*)? NEWLINE_EOF;
line 2450 in PlSqlLexer.g4
@kaby76 hello, i found you just modified this code about 3 month ago, could you please help me to figure this question out in your spare time? Thank you so much !
The change I made is unrelated to this problem. All I did was was to rename self. to this. for those two rules in order to put the grammar into "target agnostic format".

REMARK_COMMENT was added long before, first here: 3f0150f#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2324

A predicate was then added the same day here: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

Then, it was changed again the day after to what it was until I changed it: 356f3ea#diff-f0a9ac045571f25a6a3533b5c47d6287b7096503195911cb1e1e927a6a5a12c9R2328

I will look at it over the weekend. My first impression, though, is that REM and PRO are parser-state aware lexing because it's not just that you have to look for the previous newline chars, but verify that it's not part of a statement. This is one of the things Antlr does not do well at all.

@KvanTTT ok, I get it. Thanks for your reply. My current way is to temporarily remove these two lexical definitions

kaby76 commented 11 months ago

I think the problem here is that REMARK and PROMPT should be considered commands. They're not really comments. So, I think you're right, the rules should not be there in the lexer.

Dark-Athena commented 6 days ago

maybe only on python target ,because it's ok on java target

C:\antlr>grun PlSql sql_script  -tree
select 1 from pro;
^Z
(sql_script (unit_statement (data_manipulation_language_statements (select_statement (select_only_statement (subquery (subquery_basic_elements (query_block select (selected_list (select_list_elements (expression (logical_expression (unary_logical_expression (multiset_expression (relational_expression (compound_expression (concatenation (model_expression (unary_expression (atom (constant (numeric 1)))))))))))))) (from_clause from (table_ref_list (table_ref (table_ref_aux (table_ref_aux_internal (dml_table_expression_clause (tableview_name (identifier (id_expression (regular_id pro))))))))))))))))) ; <EOF>)

C:\antlr>pygrun PlSql sql_script  --tree
select 1 from pro;
^Z
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Scripts\pygrun.exe\__main__.py", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\_pygrun.py", line 156, in main
    process(input_stream, class_lexer, class_parser)
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\_pygrun.py", line 124, in process
    token_stream.fill()
    ~~~~~~~~~~~~~~~~~^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\BufferedTokenStream.py", line 301, in fill
    while self.fetch(1000)==1000:
          ~~~~~~~~~~^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\BufferedTokenStream.py", line 124, in fetch
    t = self.tokenSource.nextToken()
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\Lexer.py", line 137, in nextToken
    ttype = self._interp.match(self._input, self._mode)
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 104, in match
    return self.execATN(input, dfa.s0)
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 173, in execATN
    target = self.computeTargetState(input, s, t)
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 231, in computeTargetState
    self.getReachableConfigSet(input, s.configs, reach, t)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 280, in getReachableConfigSet
    if self.closure(input, config, reach, currentAltReachedAcceptState, True, treatEofAsEpsilon):
       ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 359, in closure
    currentAltReachedAcceptState = self.closure(input, c, configs, currentAltReachedAcceptState, speculative, treatEofAsEpsilon)
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 357, in closure
    c = self.getEpsilonTarget(input, config, t, configs, speculative, treatEofAsEpsilon)
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 396, in getEpsilonTarget
    if self.evaluatePredicate(input, t.ruleIndex, t.predIndex, speculative):
       ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DarkAthena\AppData\Local\Programs\Python\Python313\Lib\site-packages\antlr4\atn\LexerATNSimulator.py", line 465, in evaluatePredicate
    return self.recog.sempred(None, ruleIndex, predIndex)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\antlr\PlSqlLexer.py", line 16864, in sempred
    return pred(localctx, predIndex)
  File "C:\antlr\PlSqlLexer.py", line 16875, in PROMPT_MESSAGE_sempred
    return this.IsNewlineAtPos(-4)
           ^^^^
NameError: name 'this' is not defined. Did you forget to import 'this'?

C:\antlr>

antlr / grammars-v4

[PlSql] "REM", "REMARK", "PRO", "PROMPT" can not be a identifier #3817