crate / sqlalchemy-cratedb

SQLAlchemy dialect for CrateDB.
https://cratedb.com/docs/sqlalchemy-cratedb/
Apache License 2.0
3 stars 2 forks source link

Support SQLAlchemy dialect with psycopg2 #100

Closed amotl closed 2 years ago

amotl commented 3 years ago

Hi there,

coming from https://github.com/smartsdk/ngsi-timeseries-api/issues/407#issuecomment-738644201, I am asking myself whether it would be possible to use the CrateDB SQLAlchemy dialect together with the Python DB API 2.0 compatible psycopg2 driver.

Maybe @mfussenegger, @chaudum or @autophagy can add a comment about this?

With kind regards, Andreas.

mfussenegger commented 3 years ago

I think the sqlalchemy dialect doesn't have too many concrete dependencies on crate-python. Could try monkey-patching the driver name.

One exception from the client is used: https://github.com/crate/crate-python/blob/f2617dd789f0eb70b042ac77be4b7c8bd936a7c6/src/crate/client/sqlalchemy/dialect.py#L34

but I don't think that's a real issue.

We could also consider to de-couple it completely, but not sure what's the benefit of using psycopg2 over the crate-python http based client?

We also have to keep in mind that psycopg2 is LGPL licensed, which is afaik not compatible with Apache licensed projects. So we wouldn't be able to ship anything with psycopg2 included.

amotl commented 3 years ago

Hi Mathias,

thanks for looking into this. I was just trying to figure out if there are any other intrinsic couplings which would prevent this in general.

I would not de-couple it but instead provide an option to invoke it like outlined at [1].

# default
engine = create_engine('crate://localhost:4200')

# psycopg2
engine = create_engine('crate+psycopg2://localhost:5432')

Not sure what's the benefit of using psycopg2 over the crate-python HTTP-based client?

Wouldn't the communication be a bit more efficient regarding binary serialization on the wire and without the overhead of HTTP? Also, things like how to properly configure a shared connection pool (https://github.com/crate/sqlalchemy-cratedb/issues/90) would not need any special attention.

We also have to keep in mind that psycopg2 is LGPL licensed.

Thanks for bringing that up. SQLAlchemy is MIT licensed and references psycopg2 within an optional "extra" dependency [2,3]. So I believe it would be fine doing it in the same way?

With kind regards, Andreas.

[1] https://docs.sqlalchemy.org/en/13/core/engines.html#postgresql [2] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_4_0b1/setup.cfg#L60 [3] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_4_0b1/setup.cfg#L65

mfussenegger commented 3 years ago

Wouldn't the communication be a bit more efficient regarding binary serialization on the wire and without the overhead of HTTP?

The PostgreSQL wire communication supports text and binary serialization. I think psycopg2 is using the text serialization. I didn't verify that, but I think it is one of the reasons that asyncpg is a lot faster - because they use binary serialization. And before they merged https://github.com/MagicStack/asyncpg/pull/295 our crate-python executemany was significantly faster for inserts (orders of magnitude) than asyncpg. I haven't tested again since they merged the batch improvement.

SQLAlchemy is MIT licensed and references psycopg2 within an optional "extra" dependency [2,3]. So I believe it would be fine doing it in the same way?

Yes it works as optional dependency. I think the license only prevents us from providing a bundle that would include psycopg2 code out of the box.

amotl commented 3 years ago

I just wanted to add that both @proddata and @SStorm signaled interest in having this feature available.

mfussenegger commented 3 years ago

I just wanted to add that both @proddata and @SStorm signaled interest in having this feature available.

Could you elaborate on the background? What's the advantage of using psycopg2?

mfussenegger commented 3 years ago

SQLAlchemy 1.4 also supports asyncpg. I think that may be more worthwhile to support than psycopg2 as it has performance advantages.

amotl commented 2 years ago

Hi again,

I fully agree that the focus should be on asyncpg instead. I will add a corresponding test case to crate/crate-python#391 and close this issue. Feel free to reopen or create another one as soon as psycopg3 is ready and qualifies further evaluation.

With kind regards, Andreas.