This is wasteful. Each of those queries is already scanning all the data from system.role_members, so we should change this so that the cache only gets populated once per node. This will change the number of internal queries needed to repopulate the cache to be O(node count) rather than O((node count) * (active user count)).
To Reproduce
The cache is invalidated by CREATE ROLE, DROP ROLE, and GRANT role statements, so this problem is easy to reproduce.
Describe the problem
When the role membership cache is invalidated, each node will need to rebuild the cache. Currently, each user whose membership is being looked up will cause a query on the
system.role_members
table. This can be seen by looking at the requestKey of thesingleflight
operation here: https://github.com/cockroachdb/cockroach/blob/89dae000e93ca462d99b251df47b1e162058b56c/pkg/sql/authorization.go#L641This is wasteful. Each of those queries is already scanning all the data from
system.role_members
, so we should change this so that the cache only gets populated once per node. This will change the number of internal queries needed to repopulate the cache to beO(node count)
rather thanO((node count) * (active user count))
.To Reproduce The cache is invalidated by
CREATE ROLE
,DROP ROLE
, andGRANT role
statements, so this problem is easy to reproduce.See the escalation https://github.com/cockroachlabs/support/issues/3128 for an example.
Expected behavior Don't have such a big latency hit.
Additional data / screenshots See the linked ticket.
Environment: This was reported on CockroachDB version v24.1.6, but all supported versions are affected.
Jira issue: CRDB-44792
Epic CRDB-43310