JSQLParser / JSqlParser

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
https://github.com/JSQLParser/JSqlParser/wiki
Apache License 2.0
5.39k stars 1.34k forks source link

Previously parsable statement error #1352

Closed morvael closed 3 years ago

morvael commented 3 years ago

This worked in 4.1, but doesn't work in 4.2:

b1.system

Parser works for

b1.`system`

Sadly our database (Informix) understands only

b1.'system'

which in turn isn't something the parser understands.

morvael commented 3 years ago

No doubt system became reserved word in 4.2, but I think the list of reserved words varies between databases and having one big set of reserved words from any and all databases is leading to such problems. Too many words are reserved because of that.

manticore-projects commented 3 years ago

Sadly our database (Informix) understands only

b1.'system'

b1."system" does not work on informix? Anyway, system is likely a reserved keyword because of Oracle's ALTER SYSTEM ... statement.

morvael commented 3 years ago

Ok, b1."system" works, so this is a workaround. But as I wrote, there should be a notion of dialects with different sets of reserved words, instead of one big set covering all databases. Because that is very restrictive. And with every version of JSqlParser that adds new reserved keywords all queries passing through the parser must be examined if they don't contain words that were OK, but suddenly aren't OK anymore...

morvael commented 3 years ago

Looking at your fix, I had column named system, not table.

morvael commented 3 years ago

SELECT b1.system FROM table b1

morvael commented 3 years ago

No a fan of workaround, as columName in parsed statement retains those quote chars, so some other code using that must be aware of that possibility and add conditional stripping...

manticore-projects commented 3 years ago

Looking at your fix, I had column named system, not table.

Well, that is an example why verbose samples and test cases are important. This works now:

SELECT system from b1.system
morvael commented 3 years ago

Haven't seen such query yet, but ok :)

manticore-projects commented 3 years ago

Haven't seen such query yet, but ok :)

Have not seen system used as a column name or alias without proper quoting either -- so should I reconsider :-D

Btw, there is a comprehensive list of SQL 99 Keywords. You can find it on the internet. I can only advise to avoid any of these keywords and to apply proper quoting to all table- and column names regardless. Furthermore it is good practice to write RDBMS agnostic, standard compliant queries -- unless the RDBMS itself forces you to use a certain non-compliant syntax (yes, I mean you, Oracle!)

morvael commented 3 years ago

Yes, but it's legacy system still in use, and that field exists for longer than I work with it... Will not change it just to remove potential collision :)

manticore-projects commented 3 years ago

And one more more thing: I believe most of the contributors know Oracle or MS SQL Server, but few use Informix.

So if you have a catalog of your own preferred Informix queries and would like to contribute it, we could actually integrate it into our unit tests and assure that nothing breaks during the development.

morvael commented 3 years ago

Have you considered handling dialects (different sets of reserved words depending on selected database engine)? I know it may be a bit problematic because of static construction of the parser definition...

morvael commented 3 years ago

As I thought, SYSTEM is not reserved in the core SQL-92 standard.

manticore-projects commented 3 years ago

As I thought, SYSTEM is not reserved in the core SQL-92 standard.

I wrote SQL99 for a reason. https://www.drupal.org/docs/develop/coding-standards/list-of-sql-reserved-words

manticore-projects commented 3 years ago

Yes, there were a few discussions around, but no really good solution emerged so far. More importantly, in your case even a perfect solution would not have helped much because (to my best knowledge) nobody uses or knows Informix -- so who would write the dialect specific grammar?

morvael commented 3 years ago

Well, so far we have only encountered problems with EXECUTE FUNCTION and EXECUTE PROCEDURE syntax, and fully qualified name syntax database@server:owner.table ( https://www.ibm.com/docs/en/informix-servers/14.10?topic=database-specifying-object-in-cross-server-query ), where owner is optional. There are also some simple unrecognized function names, like CURRENT (if I'm not mistaken). We try to work around these problems, but "native" support would be great.

morvael commented 3 years ago

https://www.quora.com/Why-is-Informix-database-not-very-popular :)

morvael commented 3 years ago

A good idea would be to somehow make the parser more dynamic, where I could register/deregister keywords at will (or map them to be interpreted as some other keywords), including support for "raw sql part" which the parser doesn't try to fit into object model, but just stores it raw and appends when needed. That way I could quickly configure our instance of parser to do what I need it to do, without you having any Informix experts on your side.

manticore-projects commented 3 years ago

Unfortunately, this is not how the JavaCC Parser generator works. As keyword is not a keyword because we wanted to make life difficult. As keyword is actually a token which tells the parser generator to start a certain sequence.

Oracle's 'ALTER SYSTEM' will enforce 'SYSTEM' as keyword, unless we created a different Grammar without the 'ALTER SYSTEM' production.

In my opinion, we would need:

1) a Template Engine and a Grammar Template, which will generate the Java CC Grammar for a certain RDBMS (based on paramaters) 2) then build the Parser dynamically and "on the fly"

This comes at a high cost though: a) we would need to deploy the template engine and the JavaCC library b) the performance would suffer, at least regarding the initiation time (building the template and the Parser) c) it would need the Java Compiler and run only with a JDK (but not with a JRE)

Big effort for small benefits.

morvael commented 3 years ago

Yes, wrote my share of Antlr4 grammars, so I know how it rolls, and how static (while yet dynamic) it is. So the only other option is to pester you forever with small(ish) issues on github, to add them manually each time there's a problem :)

morvael commented 3 years ago

Too bad it isn't possible for SYSTEM to be keyword only after ALTER, not in some other context (I'd know how to solve this in Antlr4).

manticore-projects commented 3 years ago

Too bad it isn't possible for SYSTEM to be keyword only after ALTER, not in some other context (I'd know how to solve this in Antlr4).

That's exactly the point: JavaCC works with Tokens and ALTER SYSTEM is actually a sequence of 2 tokens.

When we define now Tablename, we need to define all allowed Tokens and since SYSTEM is a Token on its own already, we need to allow it explicitly.

I would not have thought, that anybody would use SYSTEM as a qualified name anywhere and for that reason I have not enabled it as a qualified name, when implementing ALTER SYSTEM.

Until someone reports, that indeed SYSTEM is used as a column or table name. Then we add it and try, if everything else still works. If so, then good. If something else breaks, we turn it down.

manticore-projects commented 3 years ago

Too bad it isn't possible for SYSTEM to be keyword only after ALTER, not in some other context (I'd know how to solve this in Antlr4).

On ANTLR4, I believe it is more flexible regarding the grammar, but the generated parser is less performant. (Although I have no proof and might be simply wrong on that assumption.)

morvael commented 3 years ago

Yeah, you can try to move some parts from lexer to parser level in antlr4, but following good practice of defining every token for lexer you end up with the same problem, where you have to add them manually to various parser rules. Maybe just include keywords as allowed table/column names by default, if they are not in the basic SQL-92 standard, but come from some extension or one specific database.

morvael commented 3 years ago

Sort of making it backwards compatible by default - adding new keyword? Fine, just add it as allowed table/column name. Because someone might have been using it like that, when it wasn't keyword yet.