Closed thanodnl closed 3 years ago
The user was actually joining against a reference table instead of a distributed table. Disabling the GUC didn't solve it for them.
To reproduce I used the following file:
$ cat psa.sql
CREATE TABLE foo (a int, application_name text);
SELECT create_reference_table('foo');
EXPLAIN WITH psa AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM foo JOIN psa USING (application_name);
Running against a cleanly provisioned database:
psql -p9700 -a -f psa.sql
There are 2 explain outputs that you will get v9.3.5:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ -> Distributed Subplan 1_1 │
│ -> Function Scan on pg_stat_get_activity s (cost=0.00..1.00 rows=100 width=68) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9801 dbname=postgres │
│ -> Merge Join (cost=148.00..248.25 rows=6350 width=72) │
│ Merge Cond: (intermediate_result.application_name = foo.application_name) │
│ -> Sort (cost=59.83..62.33 rows=1000 width=68) │
│ Sort Key: intermediate_result.application_name │
│ -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..10.00 rows=1000 width=68) │
│ -> Sort (cost=88.17..91.35 rows=1270 width=36) │
│ Sort Key: foo.application_name │
│ -> Seq Scan on foo_102008 foo (cost=0.00..22.70 rows=1270 width=36) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
v9.4.3:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9801 dbname=postgres │
│ -> Hash Join (cost=2.25..36.06 rows=635 width=72) │
│ Hash Cond: (foo.application_name = s.application_name) │
│ -> Seq Scan on foo_102008 foo (cost=0.00..22.70 rows=1270 width=36) │
│ -> Hash (cost=1.00..1.00 rows=100 width=72) │
│ -> Function Scan on pg_stat_get_activity s (cost=0.00..1.00 rows=100 width=72) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
Git bisect point to https://github.com/citusdata/citus/commit/24feadc23013016450a731614c15a970b777b39d which is introduced with https://github.com/citusdata/citus/pull/3887
The root of the issue is that we don't have general support for view permissions. There are apparently some ways to read from a view without having permission to read from a table, namely when the view is encapsulated in a (materialized, recursively planned) CTE and only involves local tables and is part of a query with distributed tables.
PG12 inlines the CTE which also transforms it into a subquery. After inlining, the view is unwrapped and permissions are required, even if we recursively plan the query. While the latter leads to the same plan as a materialized CTE, it requires table read permission.
In <=9.3, CTE inlining tripped over the following error:
DEBUG: Planning after CTEs inlined failed with
message: relation pg_database is not distributed
detail:
hint:
Due to that error, the inlining was effectively undone pre-9.4, returning to the previous materialized CTE state, in which the query was supported.
For that cte query, postgres doesn't give an error, instead, it behaves like:
WITH pg_stat_activity AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM foo LEFT JOIN pg_stat_activity USING (application_name);
┌──────────────────┬──────┬───────┬──────────────────────────┐
│ application_name │ a │ datid │ query │
├──────────────────┼──────┼───────┼──────────────────────────┤
│ psql │ 1337 │ 13722 │ <insufficient privilege> │
└──────────────────┴──────┴───────┴──────────────────────────┘
(1 row)
~So giving permission denied
error is the wrong behavior.~
edit: Giving permission denied
error is the expected behavior for citus.~
I can reproduce this error, which is the expected behavior, on master, v9.5.0 & v9.4.0 but not on v9.3.0
I will use git-bisect to see which commit fixed this behavior
After inlining, the view is unwrapped and permissions are required, even if we recursively plan the query.
This is the unclear part to me. CTE inlining just copies the CTE->subquery to a regular subquery in the from clause. Same for both PG and Citus. Why is the view unwrapped after inlining? I guess I'm missing something here, need to debug a little further
@onurctirtir
I will use git-bisect to see which commit fixed this behavior
I ran git bisect, linked the commit above: https://github.com/citusdata/citus/commit/24feadc23013016450a731614c15a970b777b39d
Also, in production we solved it by adding random()
to the target list of the CTE. This is confirmed by the user it worked for them.
@onurctirtir
I will use git-bisect to see which commit fixed this behavior
I ran git bisect, linked the commit above: 24feadc
Yeah I've seen that, but in that commit and in the previous commit of it, we still throw the permission error, so I couldn't be sure if it's the fixing commit
we still throw the permission error
I think the title of the issue might need revision after newer insights, not to say permissions on views but not on tables might be an issue.
The root cause of the customer has not directly todo with permissions. More with the fact that the query against the pg_stat_activity
view got pushed down to the worker. In the process of pushing down the query on the view it unrolled the query -which I think is expected- causing the permission on pg_authid
.
From a distributed planners perspective It is a coincidence the is a pg_authid
table is there, and the error comes from the fact that the user can't query it. This made us catch the fault to be fair, but is a red-herring for the underlying problem of pushing down the query on non-distributed tables.
More with the fact that the query against the pg_stat_activity view got pushed down to the worker.
note that this only happens on 9.4, not on 9.5 / master.
master correctly does recursive planning:
WITH pg_stat_activity AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM ref r LEFT JOIN pg_stat_activity ON (r.a = pg_stat_activity.datid);
NOTICE: executing the command locally: SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_star
t, s.xact_start, s.query_start, s.state_change, s.wait_event_type, s.wait_event, s.state, s.backend_xid, s.backend_xmin, s.query, s.backend_type FROM (((SELECT intermediate_result.datid, inte
rmediate_result.pid, intermediate_result.usesysid, intermediate_result.application_name, intermediate_result.state, intermediate_result.query, intermediate_result.wait_event_type, intermediat
e_result.wait_event, intermediate_result.xact_start, intermediate_result.query_start, intermediate_result.backend_start, intermediate_result.state_change, intermediate_result.client_addr, int
ermediate_result.client_hostname, intermediate_result.client_port, intermediate_result.backend_xid, intermediate_result.backend_xmin, intermediate_result.backend_type, intermediate_result.ssl
, intermediate_result.sslversion, intermediate_result.sslcipher, intermediate_result.sslbits, intermediate_result.sslcompression, intermediate_result.ssl_client_dn, intermediate_result.ssl_cl
ient_serial, intermediate_result.ssl_issuer_dn, intermediate_result.gss_auth, intermediate_result.gss_princ, intermediate_result.gss_enc FROM read_intermediate_result('2_1'::text, 'binary'::c
itus_copy_format) intermediate_result(datid oid, pid integer, usesysid oid, application_name text, state text, query text, wait_event_type text, wait_event text, xact_start timestamp with tim
e zone, query_start timestamp with time zone, backend_start timestamp with time zone, state_change timestamp with time zone, client_addr inet, client_hostname text, client_port integer, backe
nd_xid xid, backend_xmin xid, backend_type text, ssl boolean, sslversion text, sslcipher text, sslbits integer, sslcompression boolean, ssl_client_dn text, ssl_client_serial numeric, ssl_issu
er_dn text, gss_auth boolean, gss_princ text, gss_enc boolean)) s LEFT JOIN pg_database d ON ((s.datid OPERATOR(pg_catalog.=) d.oid))) LEFT JOIN pg_authid u ON ((s.usesysid OPERATOR(pg_catalo
g.=) u.oid)))
NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 13, '2020-12-06 14:59:28.032876+01');
DETAIL: on server marco@localhost:1201 connectionId: 1
NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 13, '2020-12-06 14:59:28.032876+01');
DETAIL: on server marco@localhost:1202 connectionId: 2
NOTICE: issuing COPY "2_2" FROM STDIN WITH (format result)
DETAIL: on server marco@localhost:1201 connectionId: 1
NOTICE: issuing COPY "2_2" FROM STDIN WITH (format result)
DETAIL: on server marco@localhost:1202 connectionId: 2
NOTICE: issuing SELECT r.a, r.b, pg_stat_activity.datid, pg_stat_activity.application_name, pg_stat_activity.query FROM (public.ref_102040 r LEFT JOIN (SELECT pg_stat_activity_1.datid, pg_st
at_activity_1.application_name, pg_stat_activity_1.query FROM (SELECT intermediate_result.datid, intermediate_result.datname, intermediate_result.pid, intermediate_result.usesysid, intermedia
te_result.usename, intermediate_result.application_name, intermediate_result.client_addr, intermediate_result.client_hostname, intermediate_result.client_port, intermediate_result.backend_sta
rt, intermediate_result.xact_start, intermediate_result.query_start, intermediate_result.state_change, intermediate_result.wait_event_type, intermediate_result.wait_event, intermediate_result
.state, intermediate_result.backend_xid, intermediate_result.backend_xmin, intermediate_result.query, intermediate_result.backend_type FROM read_intermediate_result('2_2'::text, 'binary'::cit
us_copy_format) intermediate_result(datid oid, datname name, pid integer, usesysid oid, usename name, application_name text, client_addr inet, client_hostname text, client_port integer, backe
nd_start timestamp with time zone, xact_start timestamp with time zone, query_start timestamp with time zone, state_change timestamp with time zone, wait_event_type text, wait_event text, sta
te text, backend_xid xid, backend_xmin xid, query text, backend_type text)) pg_stat_activity_1) pg_stat_activity ON (((r.a)::oid OPERATOR(pg_catalog.=) pg_stat_activity.datid)))
DETAIL: on server marco@localhost:1201 connectionId: 1
NOTICE: issuing COMMIT
DETAIL: on server marco@localhost:1201 connectionId: 1
NOTICE: issuing COMMIT
DETAIL: on server marco@localhost:1202 connectionId: 2
9.4 incorrectly pushes down:
WITH pg_stat_activity AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM ref r LEFT JOIN pg_stat_activity ON (r.a = pg_stat_activity.datid);
NOTICE: issuing SELECT r.a, r.b, pg_stat_activity.datid, pg_stat_activity.application_name, pg_stat_activity.query FROM (public.ref_102040 r LEFT JOIN (SELECT pg_stat_activity_1.datid, pg_stat_activity_1.application_name, pg_stat_activity_1.query FROM (SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.wait_event_type, s.wait_event, s.state, s.backend_xid, s.backend_xmin, s.query, s.backend_type FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc) LEFT JOIN pg_database d ON ((s.datid OPERATOR(pg_catalog.=) d.oid))) LEFT JOIN pg_authid u ON ((s.usesysid OPERATOR(pg_catalog.=) u.oid)))) pg_stat_activity_1) pg_stat_activity ON (((r.a)::oid OPERATOR(pg_catalog.=) pg_stat_activity.datid)))
DETAIL: on server marco@localhost:1201 connectionId: 1
I verified that it is not fixed by citus local tables planner changes.
Then it is certain that fixing commit is between 5e5ba467932754b75a0faf241b72977bab5adee0 & d3019f1b6d153177cc9d64fa4b8190a655e7bc57,
where d3019f1b6d153177cc9d64fa4b8190a655e7bc57 is the output of git merge-base v9.5.0 master
Doing git-bisect between those two commits that I mentioned in above comment, I found the fixing commit finally, which is e0d2ac76208aef6ae0be3bb38adc23016f889b3d
Before e0d2ac76208aef6ae0be3bb38adc23016f889b3d, say 05569526077a45d170a96500446c743c87d4cea2, we were relying on multi_relation_restriction_hook
to find local relations.
However, multi_relation_restriction_hook
was even not executed for pg_stat_activity
(not sure why), and RelationRestrictionContext->hasLocalRelation
becomes false and we were doing push down execution.
However in one commit after (e0d2ac76208aef6ae0be3bb38adc23016f889b3d, which is the fixing commit), as we don't rely on multi_relation_restriction_hook
anymore, we were able to catch pg_stat_activity
view as a local relation so were not doing push down execution.
As a result, in 05569526077a45d170a96500446c743c87d4cea2, we see:
EXPLAIN WITH psa AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM foo JOIN psa USING (application_name);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9701 dbname=postgres │
│ -> Hash Join (cost=2.25..36.06 rows=635 width=72) │
│ Hash Cond: (foo.application_name = s.application_name) │
│ -> Seq Scan on foo_102008 foo (cost=0.00..22.70 rows=1270 width=36) │
│ -> Hash (cost=1.00..1.00 rows=100 width=72) │
│ -> Function Scan on pg_stat_get_activity s (cost=0.00..1.00 rows=100 width=72) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)
Time: 2307.166 ms (00:02.307)
and in e0d2ac76208aef6ae0be3bb38adc23016f889b3d, we see:
EXPLAIN WITH psa AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM foo JOIN psa USING (application_name);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ -> Distributed Subplan 2_1 │
│ -> Function Scan on pg_stat_get_activity s (cost=0.00..1.00 rows=100 width=512) │
│ -> Distributed Subplan 2_2 │
│ -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9700 dbname=postgres │
│ -> Hash Left Join (cost=2.25..17.61 rows=1000 width=440) │
│ Hash Cond: (intermediate_result.usesysid = u.oid) │
│ -> Hash Left Join (cost=1.05..13.73 rows=1000 width=376) │
│ Hash Cond: (intermediate_result.datid = d.oid) │
│ -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..10.00 rows=1000 width=312) │
│ -> Hash (cost=1.02..1.02 rows=2 width=68) │
│ -> Seq Scan on pg_database d (cost=0.00..1.02 rows=2 width=68) │
│ -> Hash (cost=1.09..1.09 rows=9 width=68) │
│ -> Seq Scan on pg_authid u (cost=0.00..1.09 rows=9 width=68) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9701 dbname=postgres │
│ -> Merge Join (cost=148.00..248.25 rows=6350 width=72) │
│ Merge Cond: (intermediate_result.application_name = foo.application_name) │
│ -> Sort (cost=59.83..62.33 rows=1000 width=68) │
│ Sort Key: intermediate_result.application_name │
│ -> Function Scan on read_intermediate_result intermediate_result (cost=0.00..10.00 rows=1000 width=68) │
│ -> Sort (cost=88.17..91.35 rows=1270 width=36) │
│ Sort Key: foo.application_name │
│ -> Seq Scan on foo_102008 foo (cost=0.00..22.70 rows=1270 width=36) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(30 rows)
Time: 22.704 ms
Now I'm trying to find some more test cases for e0d2ac76208aef6ae0be3bb38adc23016f889b3d (https://github.com/citusdata/citus/pull/4292) other than above.
For example, I tried:
CREATE TABLE local_table_1 (a int, b int);
CREATE TABLE local_table_2 (a int, b int);
CREATE VIEW local_tables_view AS
SELECT local_table_1.a AS a, local_table_1.b AS b
FROM local_table_1
LEFT JOIN local_table_2 AS l2_1 USING (a)
LEFT JOIN local_table_2 AS l2_2 USING (a);
EXPLAIN WITH cte AS (
SELECT local_tables_view.a, local_tables_view.b
FROM local_tables_view
) SELECT * FROM reference_table JOIN cte USING (a);
However, it was surprisingly not pushed down to the worker nodes even before e0d2ac76208aef6ae0be3bb38adc23016f889b3d as multi_relation_restriction_hook
is called by postgres for our view in that case and we capture it as a local relation.
Now I'm trying to understand what is the the difference between pg_stat_activity
& local_tables_view
from postgres perspective when deciding whether to call multi_relation_restriction_hook
for view
(cc: @thanodnl, @onderkalaci, @marcocitus)
It is obvious that #4292 fixes below:
EXPLAIN WITH psa AS (
SELECT pg_stat_activity.datid,
pg_stat_activity.application_name,
pg_stat_activity.query
FROM pg_catalog.pg_stat_activity
) SELECT * FROM foo JOIN psa USING (application_name);
But we might still try to push down some queries with views.
When extracting rte's from views to find local relations in GetRTEListPropertiesForQuery
, we assume that we will find some local tables in view rte..
However, some views may not even include any tables
+ See https://github.com/citusdata/citus/issues/4405
In below two queries, we still try to push the execution down to the workers, on master
:
Example - 1
EXPLAIN WITH cte AS (
SELECT locktype AS text_col
FROM pg_locks
) SELECT * FROM reference_table JOIN cte USING (text_col);
.....
NOTICE: issuing EXPLAIN (ANALYZE FALSE, VERBOSE FALSE, COSTS TRUE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE, FORMAT TEXT) WITH cte AS (SELECT pg_locks.locktype AS text_col FROM (SELECT l.locktype, l.database, l.relation, l.page, l.tuple, l.virtualxid, l.transactionid, l.classid, l.objid, l.objsubid, l.virtualtransaction, l.pid, l.mode, l.granted, l.fastpath FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath)) pg_locks) SELECT reference_table.text_col, reference_table.int_col FROM (public.reference_table_102008 reference_table(text_col, int_col) JOIN cte USING (text_col))
.....
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Node: host=localhost port=9701 dbname=postgres │
│ -> Merge Join (cost=168.00..268.25 rows=6350 width=36) │
│ Merge Cond: (cte.text_col = reference_table.text_col) │
│ CTE cte │
│ -> Function Scan on pg_lock_status l (cost=0.00..10.00 rows=1000 width=32) │
│ -> Sort (cost=69.83..72.33 rows=1000 width=32) │
│ Sort Key: cte.text_col │
│ -> CTE Scan on cte (cost=0.00..20.00 rows=1000 width=32) │
│ -> Sort (cost=88.17..91.35 rows=1270 width=36) │
│ Sort Key: reference_table.text_col │
│ -> Seq Scan on reference_table_102008 reference_table (cost=0.00..22.70 rows=1270 width=36) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(15 rows)
Time: 1740.177 ms (00:01.740)
This is because, pg_locks
doesn't refer to any tables but to pg_lock_status
function:
View definition:
SELECT l.locktype,
l.database,
.....
l.granted,
l.fastpath
FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
Example - 2
CREATE FUNCTION add(integer, integer) RETURNS integer
AS 'SELECT $1 + $2;'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
CREATE OR REPLACE func_call_view as SELECT add(1,2)::text AS text_col;
EXPLAIN WITH cte AS (
SELECT text_col
FROM func_call_view
) SELECT * FROM reference_table JOIN cte USING (text_col);
┌────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────┤
│ Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) │
│ Task Count: 1 │
│ Tasks Shown: All │
│ -> Task │
│ Error: Could not get remote plan. │
└────────────────────────────────────────────────────────────────┘
(5 rows)
Time: 9.549 ms
We should backport https://github.com/citusdata/citus/pull/4292 to 9.4 but it wouldn't be trivial as that pr depends on GetRTEListProperties
and GetRTEListProperties
depends on https://github.com/citusdata/citus/pull/4132, which is slightly a major change.
A combination of roles, CTE's, and potentially local execution cause a permission error when a view on a privileged table is queried in a CTE.
When looking at the beginning of an execution with the commands we run logged we find the following:
Here we see that the view has been unrolled. This works when running as the root user, not as any other user. With cte inlining disabled the execution doesn't show the local execution of the CTE part. I guess it is not run as a command via the executor, but instead is evaluated in place.