adjust / parquet_fdw

Parquet foreign data wrapper for PostgreSQL
PostgreSQL License
333 stars 37 forks source link

Multifile Merge optimization idea #36

Open zilder opened 2 years ago

zilder commented 2 years ago

Multifile Merge is implemented using heap data structure to provide sorted output. On each iteration we remove the top element from the heap, replace it with a new one (from the same source) and heapify. Sometimes (possibly oftentimes) when we read a new row group, all elements of that row group would appear on top of the heap before any other element; in other words all elements of that row group are less than any other element in the heap. In this case it would be cheaper to skip heapify step and read elements from that row group one after another until it's exhausted.