JohnOmernik / sqlalchemy-drill

Apache Drill Dialect for SQL Alchemy
Other
53 stars 35 forks source link

Add JDBC/DBAPI Driver as option #21

Closed cgivre closed 4 years ago

cgivre commented 5 years ago

Currently the dialect uses Drill's REST interface, however there are limitations to the REST interface that could potentially limit its usefulness for large datasets.
An easy answer for this would be to offer an option to use a DBAPI compatible JDBC driver instead of the REST interface. There is a DBAPI complaint module JayDeBeAPI which is DB API compliant and I don't think it would be that difficult to get it to work with the dialect.

JohnOmernik commented 5 years ago

I am torn on this, and I've had some great discussions with the Drill team. Using JDBC, means, for example, if we were to create a Docker container for Superset, we'd now be requiring Java, correct? Rest API was much more Pythonic. That said, the limitations of the Rest API are not just with large data sets, there are other issues because the Web service in Drill shares Heap with everything else, meaning Memory pressures can be tricky. An answer of use ODBC or JDBC is always there, but I have been dragging my feet to keep that from just being the easy answer. What do you think the overall bloat add to adding a good JDBC driver would be?

cgivre commented 5 years ago

Hey John, My thought is just to make JDBC an option rather than a requirement. I am a big fan of the REST interface, but the lack of statefulness was causing a lot of challenges with Superset.

I've never really liked ODBC at all as I found it to be a major headache to get to work properly. I've had better luck with JDBC. But getting back to the issue at hand, I think you would have to have Java installed on the box where the dialect was running, and you'd have to have the Drill JDBC driver as well. Obviously, you'd need the JayDeBeAPI module as well.

I guess my point is that if it's an option, you could still build a Superset Docker container w/o Java or additional bloat.

cgivre commented 5 years ago

It also might be worth exploring how to get Drill's REST interface to preserve state... ;-)

JohnOmernik commented 5 years ago

So, let's talk about statefulness. I had assumed I was using a stateful rest implementation, but when I wrote this, it was pretty early in my understanding of things. In my jupyter_drill module (Link here: https://github.com/JohnOmernik/jupyter_drill ) I maintain a session in the Jupyter Notebook. There are some requirements for statefulness in Rest, you have to auth enabled, but that seems like a reasonable requirement. The big key is when using Requests, I authenticate to the Rest API form, and then I get a session ID cookie. (the python requests module handles this for you) Then my session state is maintained, I can do ALTER SESSION queries, I can also do use dfs.prod and that is maintained... Would this be sufficient?

YuanZhencai commented 5 years ago

I also want odbc, because there are limitations to the REST interface

idzikovsky commented 4 years ago

I also want odbc, because there are limitations to the REST interface

I've added an option to use ODBC backend in the same way as JDBC is implemented now in PR https://github.com/JohnOmernik/sqlalchemy-drill/pull/40.