apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.36k stars 1.2k forks source link

Support LIKE pattern without any escape character #13291

Open findepi opened 2 weeks ago

findepi commented 2 weeks ago

Is your feature request related to a problem or challenge?

in standard SQL, as well as in a few systems LIKE pattern without explicit ESCAPE has no implicit ESCAPE character. Or at (least it's not a backslash)

Some systems WITHOUT implicit \ escape in LIKE patterns

Oracle 23c

https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/Pattern-matching-Conditions.html#GUID-3FA7F5AB-AC64-4200-8F90-294101428C26

If esc_char is not specified, then there is no default escape character.

I couldn't confirm this with db-fiddle. I think it's processing \% incorrectly on query input or output

Trino 461

SELECT
    -- verify backslash treatment in a string literal
    '\a' AS backslash_a,
    '\%' AS backslash_percent,
    -- verify backslash treatment in a LIKE pattern without explicit «ESCAPE '\'» clause
    'a' LIKE '\%' AS a,
    '\a' LIKE '\%' AS b,
    '%' LIKE '\%' AS c,
    '\%' LIKE '\%' AS d
 backslash_a | backslash_percent |   a   |  b   |   c   |  d
-------------+-------------------+-------+------+-------+------
 \a          | \%                | false | true | false | true

SQL Server

SELECT a,
    CASE WHEN
        (a LIKE '\%') OR NOT (a LIKE '\%')
        THEN CASE WHEN (a LIKE '\%') THEN 'true' ELSE 'false' END
        ELSE 'NULL'
    END AS is_like
FROM (VALUES ('a'), ('\a'), ('%'), ('\%')) t(a)
a | is_like
-- | --
a | false
\a | true
% | false
\% | true

Snowflake

-- using $$-quoted strings to avoid need to scape the backslash
SELECT
    -- verify backslash treatment in a string literal
    $$\a$$ AS backslash_a,
    $$\%$$ AS backslash_percent,
    -- verify backslash treatment in a LIKE pattern without explicit «ESCAPE $$\$$» clause
    $$a$$ LIKE $$\%$$ AS a,
    $$\a$$ LIKE $$\%$$ AS b,
    $$%$$ LIKE $$\%$$ AS c,
    $$\%$$ LIKE $$\%$$ AS d
BACKSLASH_A | BACKSLASH_PERCENT | A | B | C | D
-- | -- | -- | -- | -- | --
\a | \% | FALSE | TRUE | FALSE | TRUE

(note: not using GitHub markdown table rendering, as it renders \% in cells as %)

Some systems WITH implicit \ escape in LIKE patterns

PostgreSQL 17

postgres=# select version();
                                                          version
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

postgres=# SELECT
    -- verify backslash treatment in a string literal
    '\a' AS backslash_a,
    '\%' AS backslash_percent,
    -- verify backslash treatment in a LIKE pattern without explicit «ESCAPE '\'» clause
    'a' LIKE '\%' AS a,
    '\a' LIKE '\%' AS b,
    '%' LIKE '\%' AS c,
    '\%' LIKE '\%' AS d;
 backslash_a | backslash_percent | a | b | c | d
-------------+-------------------+---+---+---+---
 \a          | \%                | f | f | t | f

DataFusion

not showing CLI output due to https://github.com/apache/datafusion/issues/13286

SLT test is like this (copied from https://github.com/apache/datafusion/pull/13288)

# \ is an implicit escape character
query BBBB
SELECT
    'a' LIKE '\%',
    '\a' LIKE '\%',
    '%' LIKE '\%',
    '\%' LIKE '\%'
----
false false true false

# \ is an implicit escape character
query BBBBBB
SELECT
    'a' LIKE '\_',
    '\a' LIKE '\_',
    '_' LIKE '\_',
    '\_' LIKE '\_',
    'abc' LIKE 'a_c',
    'abc' LIKE 'a\_c'
----
false false true false true false

Describe the solution you'd like

I'd like DataFusion to follow SQL standard. LIKE without an ESCAPE should behave as not having an escape character. LIKE with ESCAPE '\' should behave as having \ escape character.

Describe alternatives you've considered

Picking arbitrary escape character to represent LIKE without an ESCAPE. This works for static patterns, as the pattern can be verified not to contain the escape character chosen. This does not work however for patterns being dynamic.

Additional context

jayzhan211 commented 3 days ago

Is there any reason to not follow Postgres?

findepi commented 21 hours ago

PostgreSQL chose to depart from SQL spec here, where most other systems did not. But yes, we can stick to PostgreSQL from DataFusion SQL dialect perspective. It would still be valuable to support escape-less LIKE expressions in the DF LP.