citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.63k stars 670 forks source link

[Bug Fix] Query on distributed tables with window partition may cause… #7740

Closed emelsimsek closed 1 week ago

emelsimsek commented 1 week ago

… segfault #7705 (#7718)

This PR is a proposed fix for issue 7705. The following is the background and rationale for the fix (please refer to 7705 for context);

The varnullingrelsfield was introduced to the Var node struct definition in Postgres 16. Its purpose is to associate a variable with the set of outer join relations that can cause the variable to be NULL. The varnullingrels for the variable "gianluca_camp_test"."start_timestamp" in the problem query is 3, because the variable "gianluca_camp_test"."start_timestamp" is coming from the inner (nullable) side of an outer join and 3 is the RT index (aka relid) of that outer join. The problem occurs when the Postgres planner attempts to plan the combine query. The format of a combine query is:

SELECT <targets>
FROM   pg_catalog.citus_extradata_container();

There is only one relation in a combine query, so no outer joins are present, but the non-empty varnullingrelsfield causes the Postgres planner to access structures for a non-existent relation. The source of the problem is that, when creating the target list for the combine query, function MasterAggregateMutator() uses copyObject() to construct a Var node before setting the master table ID, and this copies over the non-empty varnullingrels field in the case of the "gianluca_camp_test"."start_timestamp" var. The proposed solution is to have MasterAggregateMutator() use makeVar() instead of copyObject(), and only set the fields that make sense for the combine query; var type, collation and type modifier. The varnullingrelsfield can be left empty because there is only one relation in the combine query.

A new regress test issue_7705.sql is added to exercise the fix. The issue is not specific to window functions, any target expression that cannot be pushed down and contains at least one column from the inner side of a left outer join (so has a non-empty varnullingrels field) can cause the same issue.

More about Citus combine queries here. More about Postgres varnullingrels here.

(cherry picked from commit 248ff5d52a7fd5447e782c5a218cced4a33386d6)

DESCRIPTION: PR description that will go into the change log, up to 78 characters

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.64%. Comparing base (15ecc37) to head (7c33adb). Report is 2 commits behind head on release-12.1.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## release-12.1 #7740 +/- ## ============================================= Coverage 89.64% 89.64% ============================================= Files 274 276 +2 Lines 59575 59654 +79 Branches 7435 7446 +11 ============================================= + Hits 53407 53478 +71 - Misses 4036 4041 +5 - Partials 2132 2135 +3 ```