databricks / databricks-sql-python

Databricks SQL Connector for Python
Apache License 2.0
168 stars 94 forks source link

Sqlalchemy in statement is incorrectly compiled for the in clause #445

Open bkyryliuk opened 1 month ago

bkyryliuk commented 1 month ago

Version 2.9.6

Repro:

from sqlalchemy import create_engine, MetaData, Table, select, column

# Assuming you have a working database connection string
engine = create_engine('...'). # make sure to specify the catalog

metadata = MetaData()
engine.execute("create table if not exists bogdankyryliuk.bool_test as select 1 as one_val, True as bool_val")
test_table = Table('bool_test', metadata, autoload_with=engine, schema='bogdankyryliuk')
stmt = select(test_table).where(test_table.c.bool_val.in_([True]))

compiled_stmt = stmt.compile(compile_kwargs={"literal_binds": True})
print(compiled_stmt)

with engine.connect() as conn:
    result = conn.execute(stmt)
    for row in result:
        print(row)

Compiled statement is printed as expected:

SELECT bogdankyryliuk.bool_test.one_val, bogdankyryliuk.bool_test.bool_val
FROM bogdankyryliuk.bool_test
WHERE bogdankyryliuk.bool_test.bool_val IN (true)

However conn.execute(stmt) executes different statement:

DatabaseError: (databricks.sql.exc.ServerOperationError) [DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "(bool_val IN (1))" due to data type mismatch: Input to `in` should all be the same type, but it's ["BOOLEAN", "INT"]. SQLSTATE: 42K09; line 3 pos 40
[SQL: SELECT bogdankyryliuk.bool_test.one_val, bogdankyryliuk.bool_test.bool_val
FROM bogdankyryliuk.bool_test
WHERE bogdankyryliuk.bool_test.bool_val IN (%(bool_val_1_1)s)]
[parameters: {'bool_val_1_1': 1}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

That leads to the exception

bkyryliuk commented 1 month ago

It also looks like it may have been fixed in the newer versions: https://github.com/databricks/databricks-sql-python/pull/357/files it would be nice to backport it to 2.9.6

bkyryliuk commented 1 month ago

Tested that fix works via setting DatabricksDialect.supports_native_boolean = True

susodapop commented 2 weeks ago

Pro-tip: one of the reasons why the compile statement looks correct is because it's using the default statement compiler, rather than the one employed by this connector. If you want to truly see what the rendered query will look like you, need to pass the Databricks dialect to your compile() call.

stmt.compile(compile_kwargs={"literal_binds": True}, dialect=DatabricksDialect())

This doesn't fix your underlying issue, but hopefully helps you chase down issues in the future.