crflynn / databricks-dbapi

DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
MIT License
22 stars 8 forks source link

Doesn't seem to support SQL Analytics cluster (tried on Azure) #10

Closed slayerjain closed 3 years ago

slayerjain commented 3 years ago

SQL Analytics has become the most important cluster to connect from tools like Apache Superset for us. We are currently able to use this driver to connect our Superset instance to Databricks on both AWS and Azure. However, this doesn't seem to support SQL analytics clusters for now.

Are there any plans to add support for them? Also, would it make sense to switch to the ODBC drivers?

crflynn commented 3 years ago

I just tested this (on AWS FWIW) and got the following

thrift.Thrift.TApplicationException: MALFORMED_REQUEST: 
    Client Python/THttpClient (main.py) is not supported for SQL Endpoints.
    SimbaJDBCDriver 2.6.14, SimbaODBCDriver 2.6.15 or above is required.

The parameterization of the connection string doesn't appear to require any changes in order to make the connection. As far as I can tell we would just need the host, http_path, and a DB SQLA token to connect, so I don't believe there would need to be any changes to this library. I think to support SQLA changes might need to happen upstream in pyhive or further, my best guess being in the TCLIService module in pyhive.

Edit: The only other possibility I can think of is that, based on the error, connecting requires the simba driver. In that case I'm wondering if we can spoof it somehow, but I haven't ever used it so I'm not familiar with the connection differences.

crflynn commented 3 years ago

As an alternative you might be able to use https://pypi.org/project/pyodbc/ with the simba driver. To support SQLA we could wrap that library here, although it seems to use it the driver itself would still be required.

slayerjain commented 3 years ago

I can confirm that pyodbc works with the simba driver on linux for SQLA clusters. Wrapping that instead of pyhive could make more sense, and maybe we can add the Simba ODBC driver to the repo for automatic installation. something in the lines of this - https://github.com/exasol/sqlalchemy-exasol

crflynn commented 3 years ago

That would be possible but I imagine against Simba/Databricks' license agreement. We could still wrap pyodbc without the driver, however.

slayerjain commented 3 years ago

Yep, I agree!

crflynn commented 3 years ago

I made some changes in master that should enable support for SQL Analytics clusters. Would you mind testing them out on your end before I cut a new release?

slayerjain commented 3 years ago

@crflynn yes, this is awesome! we'll test it and get back to you :)

saurabnigam commented 3 years ago

@crflynn This is working! :)

crflynn commented 3 years ago

Nice. I'll make a release later in the day. Thanks for testing.

crflynn commented 3 years ago

0.5.0 should be available. Closing this but let me know if you find any other issues.