apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
378 stars 94 forks source link

docs: demonstrate more complex SQLAlchemy integration #2079

Open MMCMA opened 2 months ago

MMCMA commented 2 months ago

I am curious if there is a way to use SQLAlchemy along the ADBC driver for our postgres database. Often in SQLAlchemy I can switch between drivers by simply altering the create_engine() call. We have a lot of SQL logic written with SQLAlchemy queries (which allow us to switch between different SQL flavors). I tried to make it work with the connection object but it does not really work. I get errors like the following.

sqlalchemy.exc.NoInspectionAvailable: No inspection system is available for object of type <class 'adbc_driver_manager.dbapi.Connection'>

an example code could look like this

from sqlalchemy import MetaData, Table, select, Column, Integer
from adbc_driver_postgresql import dbapi
import pandas as pd

with dbapi.connect('some_valid_DB_URL') as con

    metadata = MetaData()
    tmp_id_table = Table('tmp_table', metadata,
                         Column('id_', Integer, primary_key=True),
                         prefixes=['TEMPORARY'])
    metadata.create_all(con)

    tmp_ids = pd.DataFrame(dict(id_=[1,2,3,4]))
    tmp_ids.to_sql('tmp_table', con=con, if_exists='append',
                   index=False)
lidavidm commented 2 months ago

Hmm, I'd have to look at what SQLAlchemy is expecting the DBAPI layer to provide here. I wouldn't be opposed to adding support so long as we can avoid a hard dependency on SQLAlchemy.

lidavidm commented 2 months ago

Oh, and adding more examples for SQLAlchemy + ADBC integration would be good, too. (Plus a toplevel section to make it clearer.)

MMCMA commented 2 months ago

I think the most efficient support you can get from SQLAlchemy team, I will raise it there as well and link this question to it. In the meantime here is some high level info docu, the docs have multiple examples on all different types of settings. Here is the link to the SQLAlchemy discussion

lidavidm commented 2 months ago

So on a brief look, the connection needs to be wrapped in SQLAlchemy's Connection or Engine to work. The existing SQLAlchemy example should wrap it in a SQLAlchemy Connection by virtue of using its connection pool. Does that work?

lidavidm commented 2 months ago

No, it seems like a bit more is needed.

Traceback (most recent call last):
  File "/home/lidavidm/Code/arrow-adbc/docs/source/python/recipe/postgresql_sqlalchemy.py", line 43, in <module>
    metadata.create_all(conn)
  File "/home/lidavidm/miniforge3/envs/adbc-dev/lib/python3.11/site-packages/sqlalchemy/sql/schema.py", line 5866, in create_all
    bind._run_ddl_visitor(
    ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lidavidm/miniforge3/envs/adbc-dev/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1488, in __getattr__
    return getattr(self.dbapi_connection, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Connection' object has no attribute '_run_ddl_visitor'
lidavidm commented 2 months ago

Looking more at it, I think the only way to do this is to actually write a SQLAlchemy Dialect as suggested by the developers on that discussion, which isn't trivial.

lidavidm commented 2 months ago

If there's will to contribute a dialect I think we could accept that here, but I don't think there's an easy way to do it otherwise (short of abusing something like on_connect but I'd hesitate to actually recommend that as a solution)