duckdb / pg_duckdb

DuckDB-powered Postgres for high performance apps & analytics.
MIT License
1.54k stars 54 forks source link

Simple scan of native table is 2x slower with pg_duckdb enabled #171

Closed lhofhansl closed 1 month ago

lhofhansl commented 1 month ago

Not sure what I was expecting. It seems when scanning a Postgres native table, all time is spent in reading from it, and - at least in this simple case - dwarfs any vectorization in DuckDB.

Feel free to close if this is expected.

I created a simple table:

=> \d x
Table "x"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
y      | numeric |           |          |
z      | numeric |           |          |

=> select count(*) from x;
count
----------
16777216
(1 row)

=> analyze x;
ANALYZE

=> set max_parallel_workers=0;
SET

Then:

=> select sum(z) from x;
sum
--------------------------------
8385930.2158780710128370101138
(1 row)

Time: 3167.987 ms (00:03.168)

=> set duckdb.execution to true;
SET

=> select sum(z) from x;
sum
-------------------
8385930.215876569
(1 row)

Time: 6474.084 ms (00:06.474)

Some details:

=> explain select sum(z) from x;

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Custom Scan (DuckDBScan)  (cost=0.00..0.00 rows=0 width=0) (actual time=6483.263..6483.289 rows=1 loops=1)
DuckDB Execution Plan:

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE  SELECT sum(z) AS sum    FROM x
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         Total Time: 6.44s         ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│         QUERY_ROOT        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Cardinality:       │
│             0             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Timing:          │
│           0.00s           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Cardinality:       │
│             0             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Timing:          │
│           0.00s           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Aggregates:        │
│          sum(#0)          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Cardinality:       │
│             1             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Timing:          │
│           0.04s           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Projections:       │
│             z             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   Estimated Cardinality:  │
│          16777087         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Cardinality:       │
│          16777216         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Timing:          │
│           0.01s           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     POSTGRES_SEQ_SCAN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Projections:       │
│             z             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   Estimated Cardinality:  │
│          16777087         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        Cardinality:       │
│          16777216         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Timing:          │
│           6.38s           │
└───────────────────────────┘

Planning Time: 10.024 ms
Execution Time: 6484.588 ms
(78 rows)

So all time is spent in the Postgres scan.

Some perf counters:



12.96%  pg_duckdb.so                             [.] pgduckdb::InsertTupleIntoChunk(duckdb::DataChunk&, duckdb::shared_ptr<pgduckdb::PostgresScanGlobalState, true>, duckdb::shared_ptr<pgduckdb::PostgresScanLo...
9.69%  pg_duckdb.so                             [.] pgduckdb::HeapReader::ReadPageTuples(duckdb::DataChunk&)
6.42%  pg_duckdb.so                             [.] duckdb::shared_ptr<duckdb::TemplatedValidityData<unsigned long>, true>::AssertNotNull(bool)
6.03%  pg_duckdb.so                             [.] pgduckdb::ConvertPostgresToDuckValue(unsigned long, duckdb::Vector&, unsigned long)
5.71%  pg_duckdb.so                             [.] pgduckdb::DetoastPostgresDatum(varlena*, bool*)