Closed colm-mchugh closed 1 week ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.69%. Comparing base (
f695971
) to head (c52f360
). Report is 2 commits behind head on main.
https://github.com/citusdata/citus/actions/runs/11615097946/job/32345026099?pr=7718
The test seems to be flaky. Please check the link. The flaky check runs the edited/added test file many times to make sure its output is the same in every run. It is usually because either some clause like 'ORDER BY` is missing; or the behavior itself is flaky.
The test seems to be flaky... It is usually because either some clause like 'ORDER BY` is missing; or the behavior itself is flaky.
Thanks @agedemenli , a couple of queries in the regress test need to be modified to ensure stable output.
In addition to the flaky tests, there are 4 failures in check-arbitrary-configs tests, all with an error like the following; this appears to be environmental? Should re-submitting the PR be ok to address these?
The self-hosted runner: c6608c3fc000002 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
Arbitrary configs check seems to be ok. For the failing style check, please refer to our coding conventions If you have any issues, feel free to reach out to me
please consider if this needs to be backported, and if we can cut a patch release sometime soon (tm) given that it is a cx reported segfault.
This PR is a proposed fix for issue 7705. The following is the background and rationale for the fix (please refer to 7705 for context);
The
varnullingrels
field was introduced to the Var node struct definition in Postgres 16. Its purpose is to associate a variable with the set of outer join relations that can cause the variable to be NULL. Thevarnullingrels
for the variable"gianluca_camp_test"."start_timestamp"
in the problem query is 3, because the variable "gianluca_camp_test"."start_timestamp" is coming from the inner (nullable) side of an outer join and 3 is the RT index (aka relid) of that outer join. The problem occurs when the Postgres planner attempts to plan the combine query. The format of a combine query is:There is only one relation in a combine query, so no outer joins are present, but the non-empty
varnullingrels
field causes the Postgres planner to access structures for a non-existent relation. The source of the problem is that, when creating the target list for the combine query, function MasterAggregateMutator() uses copyObject() to construct a Var node before setting the master table ID, and this copies over the non-empty varnullingrels field in the case of the"gianluca_camp_test"."start_timestamp"
var. The proposed solution is to have MasterAggregateMutator() use makeVar() instead of copyObject(), and only set the fields that make sense for the combine query; var type, collation and type modifier. Thevarnullingrels
field can be left empty because there is only one relation in the combine query.A new regress test issue_7705.sql is added to exercise the fix. The issue is not specific to window functions, any target expression that cannot be pushed down and contains at least one column from the inner side of a left outer join (so has a non-empty varnullingrels field) can cause the same issue.
More about Citus combine queries here. More about Postgres varnullingrels here.