Open ayushdg opened 1 year ago
Another instance of this reported by @goodwanghan in https://github.com/dask-contrib/dask-sql/issues/1108#issue-1653305345
import pandas as pd
import dask.dataframe as dd
from dask_sql import Context
df = pd.read_parquet("https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet")
ctx = Context()
res = ctx.sql(
"SELECT PULocationID, COUNT(*) AS ct FROM df GROUP BY PULocationID ORDER BY ct DESC LIMIT 5",
dataframes={"df":dd.from_pandas(df,npartitions=2)},
config_options={"sql.identifier.case_sensitive":True}
)
What happened: When setting
sql.identifier.case_sensitive=True
dask-sql still ends up converting identifiers to lowercase during the planning stage.What you expected to happen: Case sensitivity being honored when set to
True
which is the default. This needs https://github.com/apache/arrow-datafusion/issues/5626 to go in after which the planner can be instantiated with the properenable_ident_normalization
value. https://github.com/dask-contrib/dask-sql/blob/11fda749d4dc4c729350a1a1e87c0ea631f11bce/dask_planner/src/sql.rs#L548 Minimal Complete Verifiable Example: