Closed zzzeek closed 1 year ago
The issue is the Postgres JIT that gets erroneously invoked on the query. Turning jit off makes it run under a millisecond. I have a patch that explicitly disables jit for introspection queries. Will push soon.
oh i see, it would still run the query but the PG server side of it will just push it through without trying to optimize it, IIUC
oh i see, it would still run the query but the PG server side of it will just push it through without trying to optimize it, IIUC
Yes. The fact that Postgres is even trying to JIT-compile this query is arguably a Postgres cost estimation bug, because the cost of compilation here far outweighs any runtime benefits. The query is plenty fast as-is and there isn't any heavy expression computation that might benefit from compiling it to machine code.
Also, to better answer your original question, the introspection query is necessary because Postgres does not send any type information in the course of normal query flow other than type OIDs. In order to actually be able to decode data bytes, asyncpg must know what it is decoding, so the introspection query is ran whenever there is a previously unseen type OID (neither a standard type nor a previously introspected one).
You can remove the need for introspection by pre-populating the type OID mapping via set_type_codec() or set_builtin_type_codec(), however that requires you to know which types are what kind ahead of time.
by "decode" I assume you mean on the result fetching side, does that mean this same introspection occurs when using conn.fetch()
without a prepared statement? Other drivers seem to be willing to let unknown OIDs be passed through as bytes and/or bytes that they somehow guess can be treated as strings, but I haven't confirmed they aren't sneaking in OID instrospection in there somehow
that is, we have some such codecs set up in our impl for this, like this:
await asyncpg_connection.set_type_codec(
"cidr",
encoder=lambda s: s,
decoder=lambda s: s,
schema="pg_catalog",
format="text",
)
I guess other drivers do essentially that for "unknown" oids. If we had that ability here, I dont think it would cause problems for us, since we dont expect custom types besides ENUM to have any particular behaviors.
doesn't matter much, if the query's time is reduced then we wont get more complaints :)
hey there -
as you know, SQLAlchemy relies upon knowing what the names of columns will be in result sets, and for asycnpg that forces us to use PreparedStatements for statements that we want to fetch rows from.
When the preparedstatement has a custom type like an ENUM inside of it, asyncpg does a giant query up front to cache information about the datatype, here it is from my test case below inside my PG SQL log:
This is I assume once per connection/type, but we still are getting performance concerns about it as we see in #10356. SQLAlchemy uses a connection pool by default however this is still an upfront cost and some folks don't use the pool.
I would assume the purpose of this query has to do with
get_attributes()
having all the information about the oid for the type.We don't actually need the "type" part of get_attributes(), just the names (we already know the types on our end). Is there a possibility asyncpg could have some kind of connection parameter, or prepared statement parameter, that is something to the effect
use_varchar_for_custom_type
oromit_custom_type
or something such that this giant query on prepare can be skipped?Demo that produces the query in question: