adjust / parquet_fdw

Parquet foreign data wrapper for PostgreSQL
PostgreSQL License
333 stars 37 forks source link

[BUG] Wrong query results: call to parquetReScanForeignScan() causes loss of the first row-group #82

Closed jk-intel closed 1 month ago

jk-intel commented 1 month ago

Hi, When PG calls parquetReScanForeignScan(), this eventually calls one of: DefaultParquetReader::rescan() CachingParquetReader::rescan()

Both of the above reinitialize: this->row_group = 0;

This assignment should be -1, similarly to how this->row_group is initialized in the constructor of these classes. This is because on the first call to next(), this->row_group will be immediately incremented.

The way this->row_group is currently assigned to 0, causes the first call to next() to increment it to 1, and the whole first row group will not be read, leading to wrong query result.

parquetReScanForeignScan() itself is called by PG in nested-loop joins, when the parquet_fdw's foreign table is on the inner side of the loop.

I checked that changing both functions to do: this->row_group = -1; fixes the issue in my examples.

Thanks, jk

jk-intel commented 1 month ago

I suspect it may also be the culprit of this issue: https://github.com/adjust/parquet_fdw/issues/62

za-arthur commented 1 month ago

Thank you @jk-intel for another report. I've created a PR https://github.com/adjust/parquet_fdw/pull/83. I'll try to come up with additional tests.

za-arthur commented 1 month ago

I merged the PR. I'm closing the issue. Feel free to reopen it if you still see the issue.