duckdb / duckdb_spatial

MIT License
489 stars 40 forks source link

Failure Casting String Literal to DATE in WHERE clause #457

Open b-schmeling opened 2 days ago

b-schmeling commented 2 days ago

If I run the following with the file samsung_sales.xlsx

import duckdb

sql = """
    INSTALL spatial; LOAD spatial;
    SELECT *
    FROM st_read('./samsung_sales.xlsx', open_options=['HEADERS=FORCE'], layer='Orders')
    WHERE "Date" >= '2022-01-01'
conn = duckdb.connect(database=":memory:")
conn.execute(sql)

I receive the error:

Traceback (most recent call last): File "/Users/bethany/repos/x/date_duckdb.py", line 15, in conn.execute(sql) duckdb.duckdb.IOException: IO Error: GDAL Error (1): SQL Expression Parsing Error: syntax error, unexpected $undefined, expecting end of string. Occurred around : Date>='2022-01-01'::DATE AND Date IS NOT NULL

The same error occurs even if I explicitly cast the string literal, e.g.:

import duckdb

sql = """
    INSTALL spatial; LOAD spatial;
    SELECT *
    FROM st_read('./samsung_sales.xlsx', open_options=['HEADERS=FORCE'], layer='Orders')
    WHERE "Date" >= CAST('2022-01-01' AS DATE)
"""
conn = duckdb.connect(database=":memory:")
conn.execute(sql)

However, if I run this same SQL with a csv file, e.g. BikeShare.csv it succeeds:

import duckdb

sql = """
    SELECT *
    FROM read_csv('./BikeShare.csv')
    WHERE "date" >= '2022-01-01'
"""
conn = duckdb.connect(database=":memory:")
conn.execute(sql)
Maxxen commented 1 day ago

Hello! this is related to filter pushdown into GDAL (st_read), which is why you don't get the same error with the csv. When we pass filters down we "stringily" the expressions, but It seems like GDAL's SQL-dialect doesn't understand the :: as a cast operator since it is syntax that DuckDB has inherited from PostgreSQL.

As a workaround for now you can disable the filter pushdown optimizer by running:

pragma disabled_optimizers='filter_pushdown';