adjust / parquet_fdw

Parquet foreign data wrapper for PostgreSQL
PostgreSQL License
333 stars 37 forks source link

Statistics not collected on parquet_fdw's foreign table (causing suboptimal planning, leaving a lot of perf on the table) #80

Closed jk-intel closed 1 month ago

jk-intel commented 1 month ago

Hi, Given a foreign table my_parquet_table created with parquet_fdw, running psql> analyze my_parquet_table; does not update pg_stats with column statistics. it only seems to update total number of rows for the table in pg_class catalog.

There seem to be two bugs related to this:

  1. The last line in MultifileExecutionState::next() returns res, while it should return (res == RS_SUCCESS) instead. The wrong return value causes parquetAcquireSampleRowsFunc() to not produce any row samples into its output argument "rows".
  2. The last line in parquetAcquireSampleRowsFunc() returns (cnt - 1), while it should return just (cnt). According to the documentation: https://www.postgresql.org/docs/16/fdw-callbacks.html " The actual number of rows collected must be returned. "

Fixing these two locations seems to cause proper updates to the pg_stats.

za-arthur commented 1 month ago

Thanks for the ticket. It makes sense. I've created a PR and I'm going to merge it: https://github.com/adjust/parquet_fdw/pull/81 Could you check if it helps?

jk-intel commented 1 month ago

Thanks for the ticket. It makes sense. I've created a PR and I'm going to merge it: #81 Could you check if it helps?

I checked these specific two fixes in my local examples prior to reporting this. Haven't tested the rest of the changes of that PR. Thanks for your prompt response.

--jk

za-arthur commented 1 month ago

Thanks for checking. I've merged the PR. I'm closing this ticket. Feel free to reopen it.