Late/optional decoding?

ale-dd commented 8 months ago

I have a few interrelated questions. Some might even be blasphemous for lack of in-depth knowledge of the wire protocol on my part.

Is there a way to delay decoding results until actually looking at data and interpreting it would become necessary?
Similarly, would it be possible to take undecoded "records" (not in the strict field-to-value map sense of asyncpg records) and pass them as-is to an upsert without having to re-encode them, so long as the bound arguments correspond to those returned by the previous query?
Would that potentially avoid the need to issue type introspection queries?
If not as of today, would this become easier once @vmarkovtsev's asyncpg-rkt fork gets merged back upstream?
Otherwise, at the very least, would it be possible to decode timestamps to something closer to their internal postgres representation - for example mapping them to the Python's decimal type? I personally find that python's datetime is terrible and Postgres timestamps can fall outside of the range that can be represented via datetime. Should such a decimal codec come with asyncpg, or should everyone write their own if and only if they think they need it?

Thank you & happy new year!

elprans commented 8 months ago

Is there a way to delay decoding results until actually looking at data and interpreting it would become necessary?

Lazy decoding is something that can theoretically be implemented, and we even considered it at some point, but ultimately this doesn't seem to be very useful given that query output is usually consumed in its entirety anyway. Do you have a concrete use case in mind?

pass them as-is to an upsert without having to re-encode them, so long as the bound arguments correspond to those returned by the previous query?

It seems like you would be better served by using a temporary table or a CTE instead.

Would that potentially avoid the need to issue type introspection queries?

You need introspection queries to be able to decode data (lazily or not, does not matter).

If not as of today, would this become easier once @vmarkovtsev's asyncpg-rkt fork gets merged back upstream?

I'm not very familiar with what asyncpg-rkt is doing. Possibly?

Otherwise, at the very least, would it be possible to decode timestamps to something closer to their internal postgres representation - for example mapping them to the Python's decimal type?

Absolutely, use set_type_codec and the tuple format, e.g:

    def decoder(ts_tuple):
        microseconds_since_pg_epoch = ts_tuple[0]
        ...

    def encoder(microseconds_since_pg_epoch):
        return (microseconds_since_pg_epoch,)

    await con.set_type_codec(
        'timestamptz',
        schema='pg_catalog',
        decoder=decoder,
        encoder=encoder,
        format='tuple',
    )

ale-dd commented 8 months ago

It seems like you would be better served by using a temporary table or a CTE instead.

that would only work on the very same database server... unfortunately that is not my use case. I'm shoveling data from a source DB to a destination DB, oftentimes without foreign data wrappers support...

I'm not very familiar with what asyncpg-rkt is doing. Possibly?

it's described here:

https://betterprogramming.pub/i-forked-asyncpg-and-it-parses-database-records-to-numpy-20x-faster-e71024a84bff

the author expressed his interest in merging it back to upstream asyncpg here:

https://github.com/MagicStack/asyncpg/issues/17#issuecomment-1145106442

ale-dd commented 7 months ago

@elprans would there be any appetite for that fork to be merged back?

MagicStack / asyncpg

Late/optional decoding? #1112