JSQLParser / JSqlParser

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
https://github.com/JSQLParser/JSqlParser/wiki
Apache License 2.0
5.29k stars 1.34k forks source link

[BUG] REGEXP_REPLACE as LikeExpression #2041

Closed ssteinhauser closed 1 month ago

ssteinhauser commented 1 month ago

I am about to implement the Exasol syntax. For this I'd like to implement REGEXP_LIKE as LikeExpression (see https://docs.exasol.com/db/latest/sql_references/predicates/not_regexp_like.htm). Since it is also allowed to be a function in Oracle (as already implemented), it needs to be added to the whitelisted keywords. Unfortunately, the following condition prevents it from being added to RelObjectNameWithoutValue since REGEXP_LIKE contains an underscore which is not allwed here: tokenValue.matches("[A-Za-z]+")

https://github.com/JSQLParser/JSqlParser/blob/11cebcfd122094402548cc8d5b53a063134284f5/src/main/java/net/sf/jsqlparser/parser/ParserKeywordsUtils.java#L235

What was the reason for adding this condition and how can or should I proceed to support REGEXP_LIKE as LikeExpression?

manticore-projects commented 1 month ago

Greetings!

Thank you for your interest and contribution. In genera this syntax is supported already and the following works:

SELECT 'My mail address is my_mail@exasol.com'
       RLIKE '(?i).*[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}.*'
       AS contains_email;

Now we only need to register REGEXP_LIKE as an Like Operator (like we did for RLIKE yet). So my advise is: follow the RLIKE implementation and also take reference to the EXTRACT production (which also can be a normal function as well as a special function). Something like this will work for the REGEXP_LIKE operator too.

I will be able to help you on this by next week only since I am somewhere between Jakarta and Lagos right now. Cheers.

ssteinhauser commented 1 month ago

This was also my first intension to implement it similar to RLIKE, so I've added a new token <K_REGEXP_LIKE: "REGEXP_LIKE"> and added it as an option to LikeExpression as well as to the Java implementation.

But the problem is, that the updateKeywords task won't whitelist the REGEXP_LIKE token (unlike EXTRACT), because it contains an Underscore. The condition tokenValue.matches("[A-Za-z]+") in the following line prevents it from being whitelisted for function names: https://github.com/JSQLParser/JSqlParser/blob/11cebcfd122094402548cc8d5b53a063134284f5/src/main/java/net/sf/jsqlparser/parser/ParserKeywordsUtils.java#L235

If there is no specific reason for the restriction to alphabetical characters only, I would suggest to match this regex: \w+

ssteinhauser commented 1 month ago

I've included an implementation of my suggestion in the PR #2044