duckdb / postgres_scanner

MIT License
195 stars 35 forks source link

Keep connections cached, and reduce round-trips #142

Closed Mytherin closed 7 months ago

Mytherin commented 7 months ago

This PR reduces the number of round-trips and reconnects that happen in the Postgres attach and subsequent queries through the following mechanisms:

Performance

Using the RNA central database we get the following performance numbers.

.timer on
ATTACH 'dbname = pfmegrnargs host = hh-pgsql-public.ebi.ac.uk port = 5432  user = reader password = NWDMCE5xdipIjRrp' AS s (TYPE POSTGRES);
-- first query also loads full schema
-- note that the new version is much faster AND loads more, since ALL schemas/tables are loaded at once
SELECT * FROM s.rnacen.ensembl_assembly;
-- old: 0.872s
-- new: 0.406s
-- psql: 0.10s

-- second time is faster since schema is cached, in new scenario use cached connection
SELECT * FROM s.rnacen.ensembl_assembly;
-- old: 0.27s
-- new: 0.08s
-- psql: 0.10s

Note that psql beats us at the first query since they merely fire a query rather than loading the catalog/table/etc information.