Closed ajwerner closed 6 months ago
@arulajmani could you drop in a comment about your idea for how we might do this manually with a zone config as a stop-gap until we have true GLOBAL system tables?
Yeah -- GLOBAL tables are ~just syntactic sugar around the global_reads
zone config attribute. My suggestion was to try and turn this on for the authentication tables you were interested in and see if that works as a stop gap until we figure out the details around making the system database mulit-region.
I think that it would be system.users
and system.role_options
. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors
and zones
.
However, this raises something extra interesting, we are already propagating all of the hash passwords via gossip and we didn't even realize! That being said, they'd be the stale passwords and we don't know when they'd be flushed.
Making the whole system config span global is likely to have very large impacts on the performance of a lot of schema changes. I don't know if we can stomach that.
All this being said, if we finish the new zone config work such that we don't need to gossip the system config span, then we could configure these tables independently. I think having users and user_role_options as global tables would be great.
Unfortunately (and this is a big unfortunately), system.users is in the system config span.
Extra unfortunately, its hard-code table ID is between descriptors and zones.
Ughgh I didn't know this, this is very unfortunate indeed. Never mind my suggestion @rafiss, I'm not sure if the zone config work getting finished fits with your timeline for making system.users
and system.role_options
accesses faster.
Here are other categories of things that would benefit from global locality:
meta1
ranges (and meta2
when we add them)system.users
, system.role_members
, system.web_sessions
, system.role_options
.system.localities
@bobvawter asks:
We had a customer outage last night where a storage array oops in one region brought down the entire cluster. Would globalized system ranges have allowed the cluster to at least continue to serve reads?
Steven Hand asks:
I have had a couple customers for which access to system ranges was a major problem. One workaround was to (temporarily) pin system ranges to one region (at the cost of access from other regions). The other workaround was to enhance the Ruby Active Record ORM adapter to support stale reads for such system metadata.
Related question from the community: https://github.com/cockroachdb/cockroach/discussions/67109
Here's an example that came up in the context of Prisma (and also would affect most other users of user-defined types): https://github.com/prisma/prisma/issues/11317#issuecomment-1065414426
I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.
BTW this is trivial now in 22.2, if we want to pull on this thread further. We're no longer gossiping the system config span and there's no reason to not split within it. To do so, I think all you need is deleting these lines + updating a few test files:
@irfansharif, I'm really happy to hear that we're unblocked due to the zone config work. The architectural improvements that came along with that work continue to pay dividends.
Is this still on CRDB's radar at all? The related Prisma issue is a concerning blocker for us if we were to move to multi-region CRDB config.
@AndrewSouthpaw definitely on our radar. With 23.2 Serverless will make system database multi region. We want to see how that goes and plan on bringing this to dedicated and self hosted in an upcoming release.
Is your feature request related to a problem? Please describe.
This idea has come up a few times recently and it seems worthwhile to centralize the discussion somewhere. Most recently https://github.com/cockroachdb/cockroach/issues/36160#issuecomment-815391771. It occurs to me that several other big problems towards which we've considered investing considerable engineering efforts could also be mitigated or solved.
Today's virtual tables are powered by an in-memory cache of all descriptors. The latency requirements to evict from such a cache makes it infeasible. If the data were local and low-latency, then it's plausible to implement these tables in a streaming fashion. This memory overhead today has not been much of a concern given other bottlenecks which generally make creating a schema of a problematic size unlikely (https://github.com/cockroachdb/cockroach/issues/63206).
Another consideration which only just occurred to me is the commit-to-emit latency of CHANGEFEEDs. The dominant source of latency in CHANGEFEED is waiting to "prove" the schema for a row (https://github.com/cockroachdb/cockroach/issues/36289). In the past we have explored leasing protocols by which changefeeds might coordinate with / hold off schema changes and thus be free to emit rows so long as they have a lease. This approach was demonstrated to work and is, on some level, viable. However, it's far from trivial and would even further complicate transactional schema changes. If the
system.descriptor
table were a global table, a resolved timestamp corresponding to the present could be emitted to each node hosting aCHANGEFEED
around the time that rows are written. This does reveal another interesting problem thatCHANGEFEED
s are going to need to deal with is that rows committed in the future due to being part of a transaction touching a global table are likely to block rows due to non-global tables. That can be mitigated using some buffering.Describe the solution you'd like
The table I'm most interested in making global is
system.descriptor
. This, today, would mean making the whole system config span global. It seems plausible one day to break up that whole concept once we have a new zone configuration architecture.One thing I haven't thought through is what happens to the
system.lease
table and its relevant protocols if the leasing transaction needs to interact with writes which carried synthetic timestamps.Jira issue: CRDB-6547
Epic CRDB-33032