Closed amotl closed 2 years ago
I think the sqlalchemy dialect doesn't have too many concrete dependencies on crate-python. Could try monkey-patching the driver name.
One exception from the client is used: https://github.com/crate/crate-python/blob/f2617dd789f0eb70b042ac77be4b7c8bd936a7c6/src/crate/client/sqlalchemy/dialect.py#L34
but I don't think that's a real issue.
We could also consider to de-couple it completely, but not sure what's the benefit of using psycopg2 over the crate-python http based client?
We also have to keep in mind that psycopg2 is LGPL licensed, which is afaik not compatible with Apache licensed projects. So we wouldn't be able to ship anything with psycopg2 included.
Hi Mathias,
thanks for looking into this. I was just trying to figure out if there are any other intrinsic couplings which would prevent this in general.
I would not de-couple it but instead provide an option to invoke it like outlined at [1].
# default
engine = create_engine('crate://localhost:4200')
# psycopg2
engine = create_engine('crate+psycopg2://localhost:5432')
Not sure what's the benefit of using
psycopg2
over thecrate-python
HTTP-based client?
Wouldn't the communication be a bit more efficient regarding binary serialization on the wire and without the overhead of HTTP? Also, things like how to properly configure a shared connection pool (https://github.com/crate/sqlalchemy-cratedb/issues/90) would not need any special attention.
We also have to keep in mind that
psycopg2
is LGPL licensed.
Thanks for bringing that up. SQLAlchemy is MIT licensed and references psycopg2
within an optional "extra" dependency [2,3]. So I believe it would be fine doing it in the same way?
With kind regards, Andreas.
[1] https://docs.sqlalchemy.org/en/13/core/engines.html#postgresql [2] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_4_0b1/setup.cfg#L60 [3] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_4_0b1/setup.cfg#L65
Wouldn't the communication be a bit more efficient regarding binary serialization on the wire and without the overhead of HTTP?
The PostgreSQL wire communication supports text and binary serialization. I think psycopg2
is using the text serialization. I didn't verify that, but I think it is one of the reasons that asyncpg is a lot faster - because they use binary serialization. And before they merged https://github.com/MagicStack/asyncpg/pull/295 our crate-python executemany was significantly faster for inserts (orders of magnitude) than asyncpg. I haven't tested again since they merged the batch improvement.
SQLAlchemy is MIT licensed and references psycopg2 within an optional "extra" dependency [2,3]. So I believe it would be fine doing it in the same way?
Yes it works as optional dependency. I think the license only prevents us from providing a bundle that would include psycopg2 code out of the box.
I just wanted to add that both @proddata and @SStorm signaled interest in having this feature available.
I just wanted to add that both @proddata and @SStorm signaled interest in having this feature available.
Could you elaborate on the background? What's the advantage of using psycopg2?
SQLAlchemy 1.4 also supports asyncpg. I think that may be more worthwhile to support than psycopg2 as it has performance advantages.
Hi again,
I fully agree that the focus should be on asyncpg instead. I will add a corresponding test case to crate/crate-python#391 and close this issue. Feel free to reopen or create another one as soon as psycopg3 is ready and qualifies further evaluation.
With kind regards, Andreas.
Hi there,
coming from https://github.com/smartsdk/ngsi-timeseries-api/issues/407#issuecomment-738644201, I am asking myself whether it would be possible to use the CrateDB SQLAlchemy dialect together with the Python DB API 2.0 compatible
psycopg2
driver.Maybe @mfussenegger, @chaudum or @autophagy can add a comment about this?
With kind regards, Andreas.