dvarrazzo / www.varrazzo.com

My personal website
https://www.varrazzo.com
1 stars 1 forks source link

blog/2020/03/06/thinking-psycopg3/ #3

Open utterances-bot opened 4 years ago

utterances-bot commented 4 years ago

Thinking psycopg3 — Daniele Varrazzo

https://www.varrazzo.com/blog/2020/03/06/thinking-psycopg3/

tiian commented 4 years ago

Do you think it will be possible to preserve the get_native_connection() low level hook to guarantee the compatibility with XTA as in this example?

Asynchronous operations are part of the XA specification, but I have no knowledge of a single resource manager that has implemented it and LIXA has not been designed to support it: it would be a so hard path I can't figure out it.

Last but not least: good hacking!

FlipperPA commented 4 years ago

I'm guessing this would also put an end to mogrify() - which is find, but something I've found mighty-handy for debugging when deep inside Django's ORM.

dvarrazzo commented 4 years ago

@tiian I don't see a problem to provide a capsule. Anyway I guess the name shouldn't be the same, so there would be e.g. the need to change something around src/xta/python/xta.i:

%typemap(in) PGconn * {
  $1 = (PGconn *) PyCapsule_GetPointer($input, "psycopg2.connection.native_connection");
}

Wouldn't be better to just pass pgconn_ptr? It would work for both psycopg2 >= 2.8 and 3.

However I'm pretty sure we can find a way to pass the raw connection from psycopg3 to lixa: let's keep in touch about it.

dvarrazzo commented 4 years ago

@FlipperPA a mogrify()-type method could still be easy to provide, for instance using the same plumbing of e.g. sql.Literal.

larskanis commented 4 years ago

@dvarrazzo Ruby's PostgreSQL client library "pg" has good COPY support. We also have very extensible and fast type cast support in both directions (send and receive). Also COPY can be combined with type casting and with async behavior. So maybe this could be a source of additional ideas: https://deveiate.org/code/pg/README_rdoc.html#label-Type+Casts

Mekk commented 4 years ago

There are various ways to signal type on API level instead of sql one. You can use fully binary bindings (convert those dates and numbers to actual binary representation and bind as such) or leave as text but signal format. There are those funny arrays of enums to elaborate what is provided (https://www.postgresql.org/docs/11/libpq-exec.html - paramTypes, paramFormats). It is a bit troublesome that appropriate constants are not present in client-side libraries and must be copied (and some more obscure cases like extesions types even dynamically discovered from pg_types). I ended up copying many constants (and some structs) from /usr/include/postgresql/N.M/catalog/pg_types.h So DATEOID=1082 and with this marker one can bind (and also fetch from results with proper flags) dates as int32 after converting them appropriatelu…

dvarrazzo commented 4 years ago

@larskanis thank you for the pointer, I'll take a look if there is something to learn there :)

I have a rough idea about how to organise the conversion layer: I'll try to write a design guideline in a followup article in the next days.

alanfranz commented 4 years ago

One idea, and one plea at the same time: don't create a new library, please. Don't make us need to choose whether we use psycopg2 or psycopg3, unless there's a very clear forward migration path (i.e. if I just sed psycopg2 to psycopg3 in my project, everything keeps working, but psycopg3 has more features).

I mean: that's the same mistake that was made when going from Python2 to Python3. It will result in a "long-running fork".

Is there any special reason why psycopg2 can't be just be evolved? Add new classes and packages, then slowly (if ever) deprecate the old ones. Pythoneers will love you <3

mangecoeur commented 4 years ago

I would love to see better support for fast creation of Arrow/Dataframes/numpy arrays from query results (possibly through the COPY support, maybe through some kind of plugin or compile time optional support). See for example Turbodbc arrow support.

At the moment we are forced to have query data converted from the binary stream into (boxed) python objects, then unbox them back into arrays - this can add a lot of overhead. I did some very rough experiments in Cython and got 3x speedup for bulk loads of queries into arrays.

dvarrazzo commented 4 years ago

@alanfranz the idea is about changing the defaults, to use by default more modern behaviours. Doing it but only internally, leaving the new objects behind some weird-to-use interface, and exposing by default the inferior old version, would be a disservice. Eventually the newer object should become the default,and in that case we'd have to bump the major version as per semver. If we kept psycopg2 as a name, apart from the weirdness of having psycopg2 version 3.0, it would end up bite the people who haven't pegged their version in their requirements file and find behaviours changed (well, they'd deserve it, but they won't miss bothering us asking why did we change psycopg2 instead of making psycopg3).

I see having an entirely different package a feature: you can use both them in a program and migrate them gradually, without getting stuck with the past in case parts of the program are not practical to migrate.

In the past I imagined to write all the new objects in a psycopg2 release and have psycopg3 as a thin package importing psycopg2 internally and exposing a different facade, with all the modern objects as default. But I think it would bloat the project enormously, everything would need to have two hard-to-reconcile code paths, and it would be a problem with global objects such as the default adapters map. I think it is a good occasion to shed a lot of weight, not to gain twice of it.

alanfranz commented 4 years ago

I see having an entirely different package a feature: you can use both them in a program and migrate them gradually, without getting stuck with the past in case parts of the program are not practical to migrate.

That's only true if psycopg2 stays supported for quite a long time. Otherwise, clients are forced to migrate. The "change name" approach is something similar to what gets suggested for Go libraries versioning, actually.

exposing by default the inferior old version,

You can have a "psycopg2.wonderful" package that gets recommended in docs and everything else, and under that package you create (progressively) the new cool things that you want to add. All the other parts remain untouched (or maybe just bugfixed) and, possibly, slowly deprecated. This way you can start adding things right away, piece by piece, instead of starting a rewrite that can be tedious. The rewrite looks like an high risk approach: 2nd system syndrome lurks. Until psycopg3 is production-ready, very few people will use it, and they will insist with the older version (which you may not like to maintain). It reminds me of Python2 -> Python3, or other major migrations.

Of course, everything is IMHO; you're the maintainer and the boss here :-) your project, your choices.

cthart commented 4 years ago

Why not write a pure python driver (like the jdbc thin driver)? Or would that no longer be psycopg?

dvarrazzo commented 4 years ago

@cthart because I think the libpq is the best way to parse and keep the state of the Postgres protocol: it is written and maintained by the database authors themselves so it's hard to beat on performance and I doubt you can do a better work on reliability.

cthart commented 4 years ago

And you can keep up with new features with minimal effort.

nicku33 commented 4 years ago

Recently on our node stack we switched from the V8-compiled postgres implementation to the libpq based "pg.client.native" and saw a drop in the number of connections in ClientRead state. I'm a little wary of not using a libpq based library now.