Cached connections on username can cause permission errors when usernames a are recycled

We cache our connections based on the username they use to connect to the workers. Due to a combination of postgres not kicking users whose account has been dropped and citus caching names instead of oid's one can have unexplainable permission errors.

Say you have a user names foo.

Session 1:

SET ROLE foo;
SET
Time: 6.905 ms

CREATE TABLE t1 (a int PRIMARY KEY, b int);
CREATE TABLE
Time: 146.722 ms

SELECT create_distributed_table('t1','a');
┌──────────────────────────┐
│ create_distributed_table │
├──────────────────────────┤
│                          │
└──────────────────────────┘
(1 row)

Time: 66.722 ms

RESET ROLE;
RESET
Time: 0.770 ms

Session 2:

DROP TABLE t1;
DROP TABLE
Time: 22.519 ms

DROP ROLE foo;
DROP ROLE
Time: 10.159 ms

CREATE USER foo;
CREATE ROLE
Time: 15.426 ms

-- make sure foo is recreated on the workers as well

SET ROLE foo;
SET
Time: 6.706 ms

CREATE TABLE t1 (a int PRIMARY KEY, b int);
CREATE TABLE
Time: 9.542 ms

SELECT create_distributed_table('t1','a');
┌──────────────────────────┐
│ create_distributed_table │
├──────────────────────────┤
│                          │
└──────────────────────────┘
(1 row)

Time: 49.190 ms

Session 1

SET ROLE foo;
SET

Time: 0.465 ms
SELECT current_user;
┌──────────────┐
│ current_user │
├──────────────┤
│ foo          │
└──────────────┘
(1 row)

Time: 0.727 ms
SELECT count(*) FROM t1;
ERROR:  permission denied for table t1_102056
CONTEXT:  while executing command on foo@localhost:9701
Time: 8.273 ms

What happened here is that Session 1 still had a connection for user name foo cached. On the worker side this session was bound to the oid of the role we had deleted before. Because we reset the role at the end of the first interaction in Session 1 and switched back to the role in the second act we could issue a query on the newly created t1 of which the new foo was the owner.

But when the queries to the shards were issued over the cached connection it caused a permission check to fail since the user with the old Oid didn't have access to the table.

Given it is strange that postgres doesn't cancel backend that are connected as a deleted user, we might want to make sure to invalidate our cached connections when we delete a user.

citusdata / citus

Cached connections on username can cause permission errors when usernames a are recycled #3785