[E-0002: Syntax error] on a column named with a reserved keyword

sterolandro commented 3 years ago

Hello Philipp, I guess this is borderline, though...

I have found that, if a table has a column named after a reserved keyword, (bulk in my case) both plsql-cop cli and sqldev fail with the E-0002: Syntax error message. Unfortunately, this is legacy code, blah blah, \<insert other lame excuses here>...

Here is an example test case:

-- create table tc (
--    id   number,
--    bulk number
-- );

create or replace function f return pls_integer
as
   l_cnt pls_integer;
begin
   select count(*)
     into l_cnt
     from tc
    where bulk = 42;
   return l_cnt;
end;
/

I have tested with Oracle SQL Developer extension 4.2.1.20210928.121434 (right click -> Check) and trial/preview db* CODECOP Version 4.2.2 (2021-09-28 11:26:06 +0200) which we are evaluating: tvdcc.sh path=src output=rpt and they both give this message E-0002: Syntax error. Please contact the author if the code compiles successfully in your environment. no viable alternative at input 'bulk'

Do you think it might be worth to address this behaviour?

Thanks SA

PhilippSalvisberg commented 3 years ago

Thanks @sterolandro for this detailed issue. Based on that, It was easy to reproduce the issue.

It's a known limitation and described here.

The current plsql parser version 4.1.2 does not handle the following 202 keywords: after, all, allow, alter, analytic, and, anyschema, as, asc, associate, authid, automatic, autonomous_transaction, before, begin, between, bulk, by, byte, canonical, case, case-sensitive, check, cluster, compound, connect, connect_by_root, constant, constraint, constructor, corrupt_xid, corrupt_xid_all, create, crossedition, current, customdatum, cycle, db_role_change, declare, decrement, defaults, define, definer, deterministic, dimension, disallow, disassociate, distinct, drop, each, editioning, else, elsif, end, evalname, exception, exception_init, exceptions, exclusive, external, fetch, following, follows, for, forall, from, goto, grant, group, having, hide, hier_ancestor, hier_lag, hier_lead, hier_parent, if, ignore, immutable, in, increment, index, indicator, indices, initially, inline, insert, instead, intersect, into, invisible, is, isolation, java, json_exists, json_table, lateral, library, like, like2, like4, likec, lock, logon, maxvalue, measures, merge, minus, minvalue, multiset, mutable, nan, nav, nchar_cs, nocopy, nocycle, nonschema, norely, not, novalidate, nowait, of, on, only, option, or, oradata, order, ordinality, over, overriding, parallel_enable, partition, passing, past, pipelined, pivot, pragma, precedes, preceding, present, prior, procedure, references, referencing, reject, rely, repeat, respect, restrict_references, result_cache, returning, revoke, select, sequential, serializable, serially_reusable, servererror, sets, share, siblings, single, some, sql_macro, sqldata, standalone, start, submultiset, subpartition, suppresses_warning_6009, the, then, to, trigger, udf, unbounded, under, union, unique, unlimited, unpivot, until, update, upsert, using, values, varray, varying, view, when, where, while, window, with, xmlattributes, xmlexists, xmlforest, xmlnamespaces, xmlquery, xmlroot, xmlschema, xmlserialize, xmltable.

And there are also these 30 keywords for SQL functions with non-standard parameters which we decided not to support: cast, collect, feature_compare, json_array, json_arrayagg, json_mergepatch, json_object, json_objectagg, json_query, json_scalar, json_serialize, json_transform, json_value, listagg, to_binary_double, to_binary_float, to_date, to_dsinterval, to_number, to_timestamp, to_timestamp_tz, to_yminterval, treat, validate_conversion, xmlagg, xmlcast, xmlcolattval, xmlelement, xmlparse, xmlpi.

We would love to support all these 232 keywords as identifiers. Unfortunately this is not that easy. We have to treat every keyword as an exception and tweak the grammar to process it somehow. In some cases it's easy, in other cases it is a lot of work and in certain cases it looks to be impossible with our current parser technology which is based on Xtext and ANTLR 3.2.

Here's another list of 274 keywords which are allowed as identifiers in certain cases. For some like inner the use is restricted for others like a there is no limitation: a, abs, absent, access, accessible, across, add, agent, aggregate, analyze, ancestor, any, append, apply, array, ascii, at, audit, auto, batch, beginning, bequeath, block, body, boolean, breadth, c, call, cascade, char, character, charset, clob, clone, close, collate, collation, columns, comment, commit, committed, compatibility, conditional, constraints, content, context, continue, conversion, copy, cost, count, coverage, cross, current_user, cursor, data, database, date, day, ddl, default, deferrable, deferred, delete, dense_rank, deprecate, depth, desc, directory, disable, document, editionable, element, empty, enable, encoding, entityescaping, error, errors, escape, except, exclude, execute, existing, exists, exit, explain, extra, fact, false, filter, final, first, force, foreign, format, forward, found, full, function, grouping, groups, hash, hierarchies, hierarchy, hour, id, identifier, immediate, include, indent, infinite, inner, instantiable, interface, interval, interval, iterate, join, json, keep, key, keys, language, last, lax, left, level, limit, local, location, locked, log, logoff, loop, main, map, mapping, match, match_recognize, matched, member, metadata, minute, mismatch, missing, mod, mode, model, month, name, natural, nested, new, next, no, noaudit, noentityescaping, none, noneditionable, noschemacheck, null, nulls, object, offset, oid, old, one, open, others, out, outer, overflow, package, pairs, parameters, parent, path, pattern, per, percent, period, permute, persistable, pipe, plan, pluggable, polymorphic, position, precision, pretty, primary, raise, range, raw, read, record, ref, reference, relies_on, remove, rename, replace, result, return, reverse, right, rollback, row, rows, rowtype, rules, running, sample, save, savepoint, scalar, scalars, schema, schemacheck, scn, search, second, seed, segment, self, sequence, set, sharing, show, shutdown, size, skip, sql, startup, statement, statement_id, static, statistics, strict, subset, subtype, suspend, table, ties, time, timestamp, transaction, true, truncate, trust, type, unconditional, unplug, updated, use, validate, value, variable, version, versions, visible, wait, wellformed, within, without, work, wrapper, write, xml, xmltype, year, yes, zone.

When you change the column name of bulk to key the example will work. Both are keywords and listed in v$reserved_words. We just support key because it is a common column in data models of our customers and also used in the Oracle Data Dictionary (25 hits when querying dba_tab_columns of a default ADB). Supporting it was essential.

Do you think it might be worth to address this behaviour?

Of course.

I checked the grammar and bulk is used in bulk_collect_into_clause which is used in

Based on that I can say, that bulk probably cannot be supported 100% as identifier in our grammar. For example there might be a conflict when the bulk column is the last column in the static_returing_clause. However, it should be possible to improve the support and to make your example work.

Are we supporting bulk as column name in the future? - Maybe. It depends on a lot of things. Right now I will not open a bug in our internal bug database, because this is a documented limitation. This might change, especially if the Oracle Database comes with components that use bulk as an identifier. This might also change if a customer's decision of licensing db* CODECOP is depending on it. ;-) - In the end we are not talking only about bulk, but the support of 231 other keywords. And right now, other things have a higher priority.

You mentioned that this is a legacy system. Hence it's probably not that easy to change the data model and use another column name. I understand that. It's a costly change. However, you are analyzing PL/SQL code with db* CODECOP. This means that you can change the code. Right? In this case I suggest to use a quoted identifier like this:

create or replace function f return pls_integer
as
   l_cnt pls_integer;
begin
   select count(*)
     into l_cnt
     from tc
    where "BULK" = 42;  -- used quoted identifier since bulk is a keyword
   return l_cnt;
end;
/

sterolandro commented 3 years ago

Philipp, thanks for the very quick and articulate answer. I swear I searched if it was documented, but I failed to find the reference in the cli documentation. Not often I have received such complete answers.

The DDL of the table is maybe 23-25 years old... poor choice even at that time, and I would not touch this unless very, very strictly necessary. Yes, I am able to work around the issue using quoted identifiers.

Thanks for your support in this issue.

Trivadis / plsql-cop-sqldev

[E-0002: Syntax error] on a column named with a reserved keyword #14