databricks / databricks-sql-python

Databricks SQL Connector for Python
Apache License 2.0
139 stars 83 forks source link

Pandas makes bad DESCRIBE query when using SQLAlchemy #184

Open freud14-tm opened 11 months ago

freud14-tm commented 11 months ago

When using the SQLAlchemy engine with Pandas, it seems that Pandas makes a bad DESCRIBE query. Here is the code:

import os

import pandas as pd

from sqlalchemy import create_engine

server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
engine = create_engine(
    f"databricks://token:{access_token}@{server_hostname}?http_path={http_path}&catalog=hive_metastore&schema=default",
)
with engine.connect() as connection:
    print(pd.read_sql("SELECT * FROM test", connection))

Here are the two query resulting from that code: image

It does not do that when using SQL connector instead:

import os

import pandas as pd

from databricks import sql

with sql.connect(
    server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
    http_path=os.getenv("DATABRICKS_HTTP_PATH"),
    access_token=os.getenv("DATABRICKS_TOKEN"),
) as connection:
    print(pd.read_sql("SELECT * FROM test", connection))

Here are version numbers:

In [1]: import sqlalchemy

In [2]: sqlalchemy.__version__
Out[2]: '1.4.49'

In [3]: from databricks import sql

In [4]: sql.__version__
Out[4]: '2.8.0'
freud14-tm commented 11 months ago

Also, it would be nice if catalog and schema were optional.

susodapop commented 11 months ago

What version of pandas do you have installed?

freud14-tm commented 11 months ago

Was on 1.5.3 but just tried on 2.0.3 and get the same thing.

saadali-e commented 5 months ago

@susodapop I have the same issue - has this been resolved in the newer versions?