[BUG] Case sensitive=True config doesn't work as intended

dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask

MIT License

397 stars 72 forks source link

What happened: When setting sql.identifier.case_sensitive=True dask-sql still ends up converting identifiers to lowercase during the planning stage.

What you expected to happen: Case sensitivity being honored when set to True which is the default. This needs https://github.com/apache/arrow-datafusion/issues/5626 to go in after which the planner can be instantiated with the proper enable_ident_normalization value. https://github.com/dask-contrib/dask-sql/blob/11fda749d4dc4c729350a1a1e87c0ea631f11bce/dask_planner/src/sql.rs#L548 Minimal Complete Verifiable Example:

df = pd.DataFrame({"id": [0, 1], "VAL": [1, 2]})
c.create_table("test", df)
c.sql("select ID from test") # works even though ID should be treated differently to `id`
c.sql("select val from test") # fails since the schema has `VAL` but the planner is looking for `val`.

import pandas as pd import dask.dataframe as dd from dask_sql import Context df = pd.read_parquet("https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet") ctx = Context() res = ctx.sql( "SELECT PULocationID, COUNT(*) AS ct FROM df GROUP BY PULocationID ORDER BY ct DESC LIMIT 5", dataframes={"df":dd.from_pandas(df,npartitions=2)}, config_options={"sql.identifier.case_sensitive":True} )

dask-contrib / dask-sql

[BUG] Case sensitive=True config doesn't work as intended #1092