antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.17k stars 3.7k forks source link

[PL/SQL] unable to parse a sqlplus script #3658

Open doberkofler opened 1 year ago

doberkofler commented 1 year ago

I expect the PL/SQL grammar to parse sql scripts containing multiple blocks of sqlplus, pl/sql and sql but when trying to do so the script.sql example reports the line 12:0 mismatched input 'declare' expecting <EOF> error.

error:

line 12:0 mismatched input 'declare' expecting <EOF>
(sql_script (sql_plus_command_no_semicolon (set_command set (regular_id echo) on)) (unit_statement (create_function_body create or replace function (function_name (identifier (id_expression (regular_id multiply)))) ( (parameter (parameter_name (identifier (id_expression (regular_id p1)))) in (type_spec (datatype (native_datatype_element number)))) , (parameter (parameter_name (identifier (id_expression (regular_id p2)))) in (type_spec (datatype (native_datatype_element number)))) ) return (type_spec (datatype (native_datatype_element number))) is (body begin (seq_of_statements (statement (return_statement return (expression (logical_expression (unary_logical_expression (multiset_expression (relational_expression (compound_expression (concatenation (concatenation (model_expression (unary_expression (atom (constant (quoted_string (variable_name (id_expression (regular_id p1))))))))) * (concatenation (model_expression (unary_expression (atom (constant (quoted_string (variable_name (id_expression (regular_id p2)))))))))))))))))) ;) end) ;)) / declare result number ; begin result := multiply ( 3 , 5 ) ; end ; /)

test:

import * as antlr4 from 'antlr4';
import PlSqlLexer from './grammar/PlSqlLexer';
import PlSqlParser from './grammar/PlSqlParser';

const input = `
set echo on

create or replace
function multiply(p1 in number, p2 in number) return number
is
begin
    return p1 * p2;
end;
/

declare
    result number;
begin
    result := multiply(3, 5);
end;
/
`;

const chars = new antlr4.CharStream(input);
const lexer = new PlSqlLexer(chars);
const tokens = new antlr4.CommonTokenStream(lexer);
const parser = new PlSqlParser(tokens);
const tree = parser.sql_script();
console.log(tree.toStringTree(null, parser));

script:

set echo on

create or replace
function multiply(p1 in number, p2 in number) return number
is
begin
    return p1 * p2;
end;
/

declare
    result number;
begin
    result := multiply(3, 5);
end;
/
kaby76 commented 1 year ago

As far as I can tell, there is no syntax for recognizing commands.

What you want is a new grammar to recognize SQL *Plus.

https://docs.oracle.com/en/database/oracle/oracle-database/23/sqpug/index.html vs

https://docs.oracle.com/en/database/oracle/oracle-database/23/lnpls/database-pl-sql-language-reference.pdf https://docs.oracle.com/en/database/oracle/oracle-database/23/lnpls/index.html#Oracle%C2%AE-Database

doberkofler commented 1 year ago

1) When looking at the grammar, it looks a lot as if the parser should support sql*plus script:

sql_script
    : sql_plus_command_no_semicolon? ((sql_plus_command | unit_statement) (SEMICOLON '/'? (sql_plus_command | unit_statement))* SEMICOLON? '/'?) EOF
    ;
sql_plus_command
    : EXIT
    | PROMPT_MESSAGE
    | SHOW (ERR | ERRORS)
    | whenever_command
    | timing_command
    | start_command
    ;

2) Independently from full sql*plus support the parser actually does not complain about the set echo off statement but refuses to parse multiple sql blocks.

kaby76 commented 1 year ago

Well Oracle itself calls it two different products, with two different documentations, one for plsql and another for sql-plus. The current grammar implementation is confusing to say the least.

The grammar does not support the commands completely and methodically.

It would probably be best to reorganize this grammar. Those rules should be yanked out and placed in another grammar along side PlSqlParser/Lexer.g4 that includes PlSqlParser/Lexer.g4, reflecting the relationship between plsql and sql-plus.

doberkofler commented 1 year ago

I'm currently just evaluating the use of the Antlr4 for a large refactoring job in a PL/SQL code base and must say that at least some basic documentation would go a long way for new users.

Well Oracle itself calls it two different products, with two different documentations, one for plsql and another for sql-plus. The current grammar implementation is confusing to say the least.

I agree and was actually surprised to see that the grammar named PlSql seems to support SqlPlus.

The grammar does not support the commands completely and methodically.

I understand.

It would probably be best to reorganize this grammar. Those rules should be yanked out and placed in another grammar along side PlSqlParser/Lexer.g4 that includes PlSqlParser/Lexer.g4, reflecting the relationship between plsql and sql-plus.

I completely agree but would like to add that most Oracle code bases are in fact stored in SqlPlus scripts and it would be very helpful to have a grammar that understand SqlPlus.

kaby76 commented 1 year ago

My plan is to separate the two grammars. My timeframe is unclear as I am overloaded cleaning up other issues that accumulate with the current process.