cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.05k stars 3.8k forks source link

sql,*: make some or all system tables LOCALITY GLOBAL #63365

Closed ajwerner closed 6 months ago

ajwerner commented 3 years ago

Is your feature request related to a problem? Please describe.

This idea has come up a few times recently and it seems worthwhile to centralize the discussion somewhere. Most recently https://github.com/cockroachdb/cockroach/issues/36160#issuecomment-815391771. It occurs to me that several other big problems towards which we've considered investing considerable engineering efforts could also be mitigated or solved.

Today's virtual tables are powered by an in-memory cache of all descriptors. The latency requirements to evict from such a cache makes it infeasible. If the data were local and low-latency, then it's plausible to implement these tables in a streaming fashion. This memory overhead today has not been much of a concern given other bottlenecks which generally make creating a schema of a problematic size unlikely (https://github.com/cockroachdb/cockroach/issues/63206).

Another consideration which only just occurred to me is the commit-to-emit latency of CHANGEFEEDs. The dominant source of latency in CHANGEFEED is waiting to "prove" the schema for a row (https://github.com/cockroachdb/cockroach/issues/36289). In the past we have explored leasing protocols by which changefeeds might coordinate with / hold off schema changes and thus be free to emit rows so long as they have a lease. This approach was demonstrated to work and is, on some level, viable. However, it's far from trivial and would even further complicate transactional schema changes. If the system.descriptor table were a global table, a resolved timestamp corresponding to the present could be emitted to each node hosting a CHANGEFEED around the time that rows are written. This does reveal another interesting problem that CHANGEFEEDs are going to need to deal with is that rows committed in the future due to being part of a transaction touching a global table are likely to block rows due to non-global tables. That can be mitigated using some buffering.

Describe the solution you'd like

The table I'm most interested in making global is system.descriptor. This, today, would mean making the whole system config span global. It seems plausible one day to break up that whole concept once we have a new zone configuration architecture.

One thing I haven't thought through is what happens to the system.lease table and its relevant protocols if the leasing transaction needs to interact with writes which carried synthetic timestamps.

Jira issue: CRDB-6547

Epic CRDB-33032

rafiss commented 3 years ago

@arulajmani could you drop in a comment about your idea for how we might do this manually with a zone config as a stop-gap until we have true GLOBAL system tables?

arulajmani commented 3 years ago

Yeah -- GLOBAL tables are ~just syntactic sugar around the global_reads zone config attribute. My suggestion was to try and turn this on for the authentication tables you were interested in and see if that works as a stop gap until we figure out the details around making the system database mulit-region.

ajwerner commented 3 years ago

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

However, this raises something extra interesting, we are already propagating all of the hash passwords via gossip and we didn't even realize! That being said, they'd be the stale passwords and we don't know when they'd be flushed.

Making the whole system config span global is likely to have very large impacts on the performance of a lot of schema changes. I don't know if we can stomach that.

ajwerner commented 3 years ago

All this being said, if we finish the new zone config work such that we don't need to gossip the system config span, then we could configure these tables independently. I think having users and user_role_options as global tables would be great.

arulajmani commented 3 years ago

Unfortunately (and this is a big unfortunately), system.users is in the system config span.

Extra unfortunately, its hard-code table ID is between descriptors and zones.

Ughgh I didn't know this, this is very unfortunate indeed. Never mind my suggestion @rafiss, I'm not sure if the zone config work getting finished fits with your timeline for making system.users and system.role_options accesses faster.

knz commented 3 years ago

Here are other categories of things that would benefit from global locality:

knz commented 3 years ago

@bobvawter asks:

We had a customer outage last night where a storage array oops in one region brought down the entire cluster. Would globalized system ranges have allowed the cluster to at least continue to serve reads?

Steven Hand asks:

I have had a couple customers for which access to system ranges was a major problem. One workaround was to (temporarily) pin system ranges to one region (at the cost of access from other regions). The other workaround was to enhance the Ruby Active Record ORM adapter to support stale reads for such system metadata.

rafiss commented 3 years ago

Related question from the community: https://github.com/cockroachdb/cockroach/discussions/67109

rafiss commented 2 years ago

Here's an example that came up in the context of Prisma (and also would affect most other users of user-defined types): https://github.com/prisma/prisma/issues/11317#issuecomment-1065414426

ajwerner commented 2 years ago

loosely relates to 79043

irfansharif commented 2 years ago

I think that it would be system.users and system.role_options. Unfortunately (and this is a big unfortunately), system.users is in the system config span. We'd need to split that out to make this work. Extra unfortunately, its hard-code table ID is between descriptors and zones.

BTW this is trivial now in 22.2, if we want to pull on this thread further. We're no longer gossiping the system config span and there's no reason to not split within it. To do so, I think all you need is deleting these lines + updating a few test files:

https://github.com/cockroachdb/cockroach/blob/41241284d524736223ebe12eb03b3cca75efb2d1/pkg/spanconfig/spanconfigstore/span_store.go#L113-L123

andy-kimball commented 2 years ago

@irfansharif, I'm really happy to hear that we're unblocked due to the zone config work. The architectural improvements that came along with that work continue to pay dividends.

AndrewSouthpaw commented 1 year ago

Is this still on CRDB's radar at all? The related Prisma issue is a concerning blocker for us if we were to move to multi-region CRDB config.

dikshant commented 1 year ago

@AndrewSouthpaw definitely on our radar. With 23.2 Serverless will make system database multi region. We want to see how that goes and plan on bringing this to dedicated and self hosted in an upcoming release.