citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.51k stars 666 forks source link

error with function calls in the outer part of a lateral join #4975

Open 0x777 opened 3 years ago

0x777 commented 3 years ago

I'm using the schema as described in this getting started page on a single node citus instance as described here.

CREATE TABLE companies (
  id bigserial PRIMARY KEY,
  name text NOT NULL,
  image_url text,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL
);

CREATE TABLE campaigns (
  id bigserial,       -- was: PRIMARY KEY
  company_id bigint REFERENCES companies (id),
  name text NOT NULL,
  cost_model text NOT NULL,
  state text NOT NULL,
  monthly_budget bigint,
  blacklisted_site_urls text[],
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  PRIMARY KEY (company_id, id) -- added
);
SELECT create_distributed_table('companies',   'id');
SELECT create_distributed_table('campaigns',   'company_id');

create function search_companies(search text)
returns setof companies as $$
begin
    return query 
    select * from companies where name ilike ('%' || search || '%');
end;
$$ language plpgsql stable;

The following query works as expected:

SELECT
  *
FROM
  (SELECT * FROM companies WHERE name ILIKE '%hello%') AS c
  LEFT OUTER JOIN LATERAL (
    SELECT * FROM campaigns cs WHERE cs.company_id = c.id
  ) AS cs ON TRUE

This one fails with an error:

SELECT
  *
FROM
  search_companies('hello') AS c
  LEFT OUTER JOIN LATERAL (
    SELECT * FROM campaigns cs WHERE cs.company_id = c.id
  ) AS cs ON TRUE
ERROR:  cannot pushdown the subquery
DETAIL:  Complex subqueries and CTEs cannot be in the outer part of the outer join

Making search_companies a distributed function didn't help:

SELECT create_distributed_function(
  'search_companies(text)'
);
onderkalaci commented 3 years ago

Related to #2614.

When a function appears in a join tree, Citus executes the function and converts the results into an intermediate result via http://docs.citusdata.com/en/v10.0/develop/reference_processing.html#subquery-cte-push-pull-execution.

And, Citus currently doesn't support: intermediate_result LEFT JOIN distributed_table

In fact, the first JOIN query between two subqueries that you shared can be fully pushed down, so the performance would be better anyway.