This PR reduces the number of round-trips and reconnects that happen in the Postgres attach and subsequent queries through the following mechanisms:
Turn the connection pool into an actual connection pool that caches Postgres connections, avoiding the need to reconnect for every transaction
Avoids running BEGIN TRANSACTION/COMMIT as separate statements unnecessarily when reading small tables
Fetch the schema/tables/types/indexes in one single set of queries send through PQsendQuery instead of running multiple separate queries that incur a round-trip every time
Fetch enums/composite types and their info in a single query instead of fetching them in two batches (previously we had two queries: (1) figured out which enums existed, (2) figured out the enum values).
.timer on
ATTACH 'dbname = pfmegrnargs host = hh-pgsql-public.ebi.ac.uk port = 5432 user = reader password = NWDMCE5xdipIjRrp' AS s (TYPE POSTGRES);
-- first query also loads full schema
-- note that the new version is much faster AND loads more, since ALL schemas/tables are loaded at once
SELECT * FROM s.rnacen.ensembl_assembly;
-- old: 0.872s
-- new: 0.406s
-- psql: 0.10s
-- second time is faster since schema is cached, in new scenario use cached connection
SELECT * FROM s.rnacen.ensembl_assembly;
-- old: 0.27s
-- new: 0.08s
-- psql: 0.10s
Note that psql beats us at the first query since they merely fire a query rather than loading the catalog/table/etc information.
This PR reduces the number of round-trips and reconnects that happen in the Postgres attach and subsequent queries through the following mechanisms:
PQsendQuery
instead of running multiple separate queries that incur a round-trip every timePerformance
Using the RNA central database we get the following performance numbers.
Note that
psql
beats us at the first query since they merely fire a query rather than loading the catalog/table/etc information.