We cache our connections based on the username they use to connect to the workers. Due to a combination of postgres not kicking users whose account has been dropped and citus caching names instead of oid's one can have unexplainable permission errors.
Say you have a user names foo.
Session 1:
SET ROLE foo;
SET
Time: 6.905 ms
CREATE TABLE t1 (a int PRIMARY KEY, b int);
CREATE TABLE
Time: 146.722 ms
SELECT create_distributed_table('t1','a');
┌──────────────────────────┐
│ create_distributed_table │
├──────────────────────────┤
│ │
└──────────────────────────┘
(1 row)
Time: 66.722 ms
RESET ROLE;
RESET
Time: 0.770 ms
Session 2:
DROP TABLE t1;
DROP TABLE
Time: 22.519 ms
DROP ROLE foo;
DROP ROLE
Time: 10.159 ms
CREATE USER foo;
CREATE ROLE
Time: 15.426 ms
-- make sure foo is recreated on the workers as well
SET ROLE foo;
SET
Time: 6.706 ms
CREATE TABLE t1 (a int PRIMARY KEY, b int);
CREATE TABLE
Time: 9.542 ms
SELECT create_distributed_table('t1','a');
┌──────────────────────────┐
│ create_distributed_table │
├──────────────────────────┤
│ │
└──────────────────────────┘
(1 row)
Time: 49.190 ms
Session 1
SET ROLE foo;
SET
Time: 0.465 ms
SELECT current_user;
┌──────────────┐
│ current_user │
├──────────────┤
│ foo │
└──────────────┘
(1 row)
Time: 0.727 ms
SELECT count(*) FROM t1;
ERROR: permission denied for table t1_102056
CONTEXT: while executing command on foo@localhost:9701
Time: 8.273 ms
What happened here is that Session 1 still had a connection for user name foo cached. On the worker side this session was bound to the oid of the role we had deleted before. Because we reset the role at the end of the first interaction in Session 1 and switched back to the role in the second act we could issue a query on the newly created t1 of which the new foo was the owner.
But when the queries to the shards were issued over the cached connection it caused a permission check to fail since the user with the old Oid didn't have access to the table.
Given it is strange that postgres doesn't cancel backend that are connected as a deleted user, we might want to make sure to invalidate our cached connections when we delete a user.
We cache our connections based on the username they use to connect to the workers. Due to a combination of postgres not kicking users whose account has been dropped and citus caching names instead of oid's one can have unexplainable permission errors.
Say you have a user names
foo
.Session 1:
Session 2:
Session 1
What happened here is that
Session 1
still had a connection for user namefoo
cached. On the worker side this session was bound to theoid
of the role we had deleted before. Because we reset the role at the end of the first interaction inSession 1
and switched back to the role in the second act we could issue a query on the newly createdt1
of which the newfoo
was the owner.But when the queries to the shards were issued over the cached connection it caused a permission check to fail since the user with the old
Oid
didn't have access to the table.Given it is strange that postgres doesn't cancel backend that are connected as a deleted user, we might want to make sure to invalidate our cached connections when we delete a user.