citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.6k stars 671 forks source link

Suggestion for improving SelectedChunkMask (Columnar) #4934

Open renevdzee opened 3 years ago

renevdzee commented 3 years ago

In columnar_reader.c in SelectedChunkMask() a call to predicate_refuted_by is made for each individual var. Because of this a where clause with an OR over multiple columns, like

WHERE a=1 OR b=1

will never filter out any chunks. The solution is to put the min/max-predicates for all vars in a list and call predicate_refuted_by only once. I implemented this for cstore_fdw (I have not been able to get the new citus extension running on my postgresql), see renevdzee/cstore_fdw@2c896d7 , and it seems to work and improves performance a lot.

renevdzee commented 3 years ago

FYI, I did some more testing and discovered my changes did slow down SelectedChunkMask while only improving this specific case. The call to predicate_refuted_by is not that expensive, the GetFunctionInfoOrNull and BuildBaseConstraint are though, and with my changes they were invoked multiple times for each column (every chunk).

When I cache the results from GetFunctionInfoOrNull and BuildBaseConstraint in a structure based on column index I get results as fast, and sometimes faster, than the original code (which has the BuildBaseConstraint outside the inner for-loop).