sqlproxyccl: proxy / serverless clusters block use of db names containing a period (`.`)

knz commented 1 year ago

Describe the problem

If I do CREATE DATABASE "my.db", I am unable to connect to the db my.db using a SQL clients in CC serverless, for example with a URL of the type postgresq://...@.../mycluster.my.db

The problem is that the sql proxy decomposes the provided db name using strings.Split, and then discards any string after the second period. So if the provided db name is mycluster.my.db, the result of the db name parsing is cluster == mycluster, dbname == my, and the .db suffix is stripped.

To Reproduce

see above

Expected behavior

Complex db names with periods in their name should be useable by SQL clients.

cc @jaylim-crl @JeffSwenson for triage

Jira issue: CRDB-21837 Epic: CRDB-22385

knz commented 1 year ago

Proposal by @jaylim-crl:

in SQL, forbid the creation of database names starting with the structure XXX-XXX-NN. (two words and a number followed by a period), i.e. CREATE DATABASE "my-db-1.test" would be forbidden, while CREATE DATABASE "mydb1.test" would be allowed.
in the proxy, analyze the name of the database:
- if it contains a prefix XXX-XXX-NN., consider that to be the tenant name and strip it;
- otherwise, consider the entire string (possibly including periods) to be the database name.

Note: we would need to impose the restriction in point (1) on all clusters (including SH / dedicated), not just serverless, so that customers can migrate existing apps from one product to another without surprise.

knz commented 1 year ago

Should we choose to go this way, we'd make the reserved prefix configurable (via a cluster setting).

bdarnell commented 1 year ago

Does postgres allow dots in database names? This seems like it might introduce ambiguity with the notation for database.schema.table. Or is the problem that you could quote things in sql so that would work but there is no such quoting for the connection string?

I would prefer a simpler rule like "no dots" instead of a more complex one that only matches XXX-XXX-NN (the latter rule also constrains our ability to evolve the tenant name scheme), if we can get away with it from a backwards-compatibility perspective.

Does the introduction of SNI-based routing give us a path to deprecating and removing this parsing of the database name? IIRC putting the tenant ID in the database name was the last choice, if neither SNI nor the options parameter were supported.

knz commented 1 year ago

Does postgres allow dots in database names?

Yes. CREATE DATABASE "my.db"

This seems like it might introduce ambiguity with the notation for database.schema.table

No. Double quotes are required in that case. SELECT * FROM "my.db".schema.table

That said if a db name contains periods, it's possible to specify it in the conn URL without quoting: postgres://../my.db. This works with regular pg client/server.

if we can get away with it from a backwards-compatibility perspective.

That's what we started with. The db names can already contain periods, that cat is out of the bag.

Does the introduction of SNI-based routing give us a path to deprecating and removing this parsing of the database name?

For serverless, yes. But generally we've wanted to offer a non-SNI option as well, and the other alternative, via the options status param (postgres://.../..?options=-ccluster=XXX-XXX-NN), is not supported by all drivers.

IIRC putting the tenant ID in the database name was the last choice, if neither SNI nor the options parameter were supported.

Correct, but we do want a solution for #92580 that works with non-TLS configurations as well, and with drivers that don't support options.

knz commented 1 year ago

I would prefer a simpler rule like "no dots" instead of a more complex one that only matches XXX-XXX-NN (the latter rule also constrains our ability to evolve the tenant name scheme)

My current thinking is that the reserved prefix would be a configurable regular expression. Its default value (used in CC and SH clusters using default) would be something a bit more generic, something like ([a-zA-Z]+-){2,}\d+\. which also includes xx-xx-xx-xx-123.. A SH customer who really wants to use a special prefix in their DB name could then customize this to their liking.

andy-kimball commented 1 year ago

in SQL, forbid the creation of database names starting with the structure XXX-XXX-NN. (two words and a number followed by a period)

This is not quite right, because tenant names are not required to have two words in them. This is only the form of the default cluster name that we generate, but the user can override it with whatever they like.

The pattern that we’d have to prohibit would instead be: “XXX-NN.”

jaylim-crl commented 1 year ago

The pattern that we’d have to prohibit would instead be: “XXX-NN.”

This is the regex that we use for cluster names:

https://github.com/cockroachdb/cockroach/blob/3910ac409195ab6ab2ec3e9f6eda553048afa23f/pkg/ccl/sqlproxyccl/proxy_handler.go#L48-L55

XXX would be the cluster name, and NN will always be a number, i.e. tenant ID.

knz commented 1 year ago

That regexp is too inclusive. It would forbid the (imho, legitimate) use of test123.mydb as a db name.

That's why i insisted on having the hyphen and a minimum of two words ((\l[\l\d]*-){2,}...) which, IIUC, includes all current serverless cluster names.

Also we will need to talk about that number at the end if we start considering using tenant names instead of IDs

knz commented 1 year ago

Oh and so the user can customize their cluster name...

We really have made a mess of this. The complete disregard for PG identifier rules in that decision process is... Unsettling.

knz commented 1 year ago

Okay at this point I'm starting to think we should push for a change in the serverless connection URLs to mandate a prefix before the cluster name. For example the URL for cluster name foo would be e.g. postgres://.../cluster:foo.db.name/

Then we'd reserve the db namespace that starts with cluster:xxx\..

Then we'd go to our serverless users who use the db name variant to add that prefix in their apps.

After that prefix becomes mandatory (in the db name position) we will become able to recognize composite db names reliably, as it will be clear when the cluster name prefix is not included.

jeffswenson commented 1 year ago

Currently the sqlproxy is implemented so that if SNI information is provided, SNI is only used if the sqlproxy is unable to extract a tenant from the option or the DB.

I think we can sidestep most of the pain by changing the precedence:

If SNI is provided and the DNS name matches the cluster's dns zone (e.g. something like *.7tc.cockroachlabs.cloud), then the SNI must match a tenant or it is rejected as not found.
If SNI does not match the zone or is absent, then the option is used.
Only if SNI and the option are missing, will the sql proxy attempt to parse the name from the DB. At this point the tenant name in the db name is a fall back for a fall back, so I think it is fine if the behavior is a little weird. If someone actually runs into this, they can work around it by creating a DB with no "." and using the new DB name in the connection string.

knz commented 1 year ago

Do you mean by "the behavior is a little weird" that IF the fall back ends up using the db name, we will always pick up the cluster name before the period, but there can still be a db name with a period after that?

jeffswenson commented 1 year ago

Do you mean by "the behavior is a little weird" that IF the fall back ends up using the db name, we will always pick up the cluster name before the period, but there can still be a db name with a period after that?

Yep!

If someone selects a database name that starts with a tenant name and the client does not support SNI it can lead to a confusing error message. The user may see a message saying that the tenant does not exist or an authentication failure if the tenant does exist.

knz commented 1 year ago

I think I'm ok with that. I'll modify #92619 accordingly.

andy-kimball commented 1 year ago

If SNI is provided and the DNS name matches the cluster's dns zone

Note that these will not match in the case where a tenant cluster has been moved from one host to another. For example:

dim-dog-100.7tc.cockroachlabs.cloud

The 7tc does not need to match the current host cluster short-id if that tenant has moved. Instead, our check would need to match against regex's more like this:

*.\.[^-]+\.cockroachlabs.cloud
*.\.[^-]+\.gcp-us-east1.cockroachlabs.cloud

In other words, we're looking for an SNI name that contains a host short-id component, because that implies that it also contains a tenant name.

There are a couple of other possible misunderstandings here that I want to ensure are clear:

While the user can customize their Serverless cluster name, they are not in control of their tenant name. For example, say that they picked the Serverless cluster name of "mycluster". Their tenant name would be "mycluster-100", where 100 is their tenant ID. So we still have the option of recognizing tenant names by using a pattern like "XXX-NN". What I was objecting to above is using the pattern "XXX-YYY-NN", which is not guaranteed.
Once we support moving tenant clusters from host to host, it won't be enough to only have the tenant name (e.g. dim-dog-100) in the database name. Instead, we need the full routing ID (e.g. dim-dog-100.7tc). As laid out in the RFC, this means that we could see this in the future:
```
postgresql://andyk@dim-dog-100.7tc.cockroachlabs.cloud/dim-dog-100.7tc.defaultdb
```

The non-ambiguous way to parse the database name in this case would be to assume that the database name is always the part after the last period (i.e. not that the cluster name is always the part before the first period). Database names with dots would therefore result in a cluster routing error (if SNI and PG option are not present). We could give users a workaround by allowing quotes to be used around the database name (e.g. dim-dog-100.7tc."my.db"), or we could change the expected format for the routing ID when used within a database name (there's no backwards-compat problem here b/c we don't yet support routing IDs in database names).

For example, we could say that when a routing ID is used in a DNS name, it's in the form dim-dog-100.7tc, but when it's used in a PG option or database name, it's dim-dog-100-7tc. For example:

postgresql://andyk@dim-dog-100.7tc.cockroachlabs.cloud/dim-dog-100-7tc.defaultdb

However, I expect users to get confused by a difference like that, so there's a usability tradeoff here. We were hoping to publish cluster routing IDs in the Cloud dashboard, so that customers can conveniently use this and so that Support can easily look up clusters. If the routing ID takes a different form depending on where it's used, it's going to get confusing...

knz commented 1 year ago

The non-ambiguous way to parse the database name in this case would be to assume that the database name is always the part after the last period

This is not the only way to do this.

Could we instead consider constraining routing IDs to always match a regexp we'd decide in advance? We could then make this an optional part of the cluster name specification.

andy-kimball commented 1 year ago

Can you give more details on what you mean?

Today, the regex for a routing ID would roughly be:

XXX-NN.YYY

Constraining it further would be problematic, as we've already rolled them out, and they're part of important user-visible and aesthetically-important connection strings. Also, remember that we have today's tenant name that we need to continue supporting for backwards-compat, which has this form: XXX-NN. The Routing ID just adds an additional part to the tenant name in order to let us transparently move tenants from host to host (see RFC for more details).

knz commented 1 year ago

What I'm requesting is to agree that any current and future routing ID has a structure we can recognize already today (i.e. we can describe today the structure of all future tenant name + routing combinations)

andy-kimball commented 1 year ago

The routing ID is described in the RFC I referenced (though note that we used the serverless cluster name rather than the constant string "serverless" as described there), with this structure:

<serverless-cluster-name>-<tenant-ID>.<host-cluster-short-id>

The only part under end-user control is the serverless-cluster-name, which is further constrained by this regex:

clusterNameRegex = regexp.MustCompile("^[a-z0-9][a-z0-9-]{4,98}[a-z0-9]$")

As far as we know today, this routing ID is sufficient for our present and future needs. Can you give a specific example of what you're proposing we change (while still preserving a high-level of backwards-compat)?

knz commented 1 year ago

What I propose is to reserve an open name space for any and all future extensions besides / in addition to the short ID, in a way that guarantees space for a composite database name using periods afterwards.

The problem I see is the following: again with this host ID we're looking at a closed namespace, and if we ever need to extend we will once again infringe on the space available for database names.

Let me replay this for you.

When this work started we had a regexp for cluster name + tenant ID.

This was already a closed namespace—it did not leave any room for extension:

the cluster name was user-configurable, so all possible values under the cluster name regexp were reserved for the end-user.
the tenant ID could be pretty much any integer.

Within that namespace, the only possible way to add extensions would have been to reserve certain tenant IDs as marker for extensions, and then play "under" that, for example by saying "if the tenant ID part starts with a zero, then the string before that should be interpreted as, say, extension name then hyphen then cluster name".

However, in the new RFC you didn't go this way. Instead, you carved another namespace: a period followed by the short ID. By doing so, you chipped once again away at the realm of possible database names containing periods. And the result is still a closed namespace: any short ID that matches the short ID regexp is a valid routing ID. Again there's no room for extensions.

So I predict the next time we will need to do something you will again suggest using a period or something similar.

This is not good namespace design.

The proper way to do this is the following:

carve an extensible, forward-compatible namespace with a reserved "slot", typically with a reserved prefix and suffix and a constraining regexp for the parts in the middle, and tell users "you can't use that, it's for us". We could, but did not, choose a CC serverless routing prefix with structure "starts with crdb:, ends with a period, no period in the middle".
inside that slot, allocate a different prefix per version of the protocol, for example the version number itself. For example, we could, but did not, say that crdb:v1 starts the v1 format which includes only the cluster name and tenant ID, separated with a hyphen (crdb:v1-clustrername-tenantID.) Then, had we done that, we could then have said that crdb:v2 starts another format including the routing ID within the same namespace, with another hyphen (either crdb:v2-routingID-tenantID or crdb:v2-tenantID-routingID).

The customer for the "outer" format is for the other layers of the stack, which are not and need not be aware of our infrastructure. The customer for the "inner" format is our own infrastructure.

As it stands, your standpoint so far has been "everything up to the last period in the database name is reserved as routing information" and that is unacceptable. We need something better.

I would be OK with a format that's a little bit more complex such as "either one or two periods, as long as the routing ID has some structure" but that needs to be the end of that complexity, and so we must ensure that format has space for extensions. It does not seem to currently have space for extensions.

knz commented 1 year ago

Let's continue this convo under the suggestion from Jeff. @JeffSwenson could you assist here?

If SNI is provided and the DNS name matches the cluster's dns zone (e.g. something like *.7tc.cockroachlabs.cloud), then the SNI must match a tenant or it is rejected as not found.

If SNI does not match the zone or is absent, then the option is used.

Only if SNI and the option are missing, will the sql proxy attempt to parse the name from the DB. At this point the tenant name in the db name is a fall back for a fall back, so I think it is fine if the behavior is a little weird. If someone actually runs into this, they can work around it by creating a DB with no "." and using the new DB name in the connection string.

Can we spell out the rules and the regular expressions a bit more precisely?

What is the precise lexical structure of the short routing ID? I can't find that in the RFC. Is it always 3 characters long? Is there a leading character or prefix? (there should be one for future extensibility, as per discussion above)
How does the proxy know what "the cluster's DNS zone" is for rule (1)? Is there a parameter? I can't see this in the proxy source code. If there is no parameter, we could possibly match a prefix of the SNI name with the routing regexp. For example, we could say that the regexp to match routing in SNI is always \l[\l\d-]+-\d+\. followed by the regexp for the short routing ID, once we agree on one.
for rule (2) we need to clamp down on what "the option" means. The current logic uses an option named --cluster, but the RFC refers to --routing. Which one is it? If we consider them both, in which order of precedence should they be considered?
for rule (3) we now have two regular expressions to match at the start of a database name:
- \l[\l\d-]+-\d+\., which is just clustername-tenantID. (I don't like it, but it is what it is. I won't litigate that one further.)
- serverless-\d+\.<routing ID>\., which is the new format suggested by the RFC (and this still needs a precise regular expression for the routing ID, as per my question above). This one is actually pretty good, because it has both a prefix (serverless-) and a fixed "envelope" (just two periods).
My understanding is that IF there's no routing information as per rules (1) and (2) THEN we attempt these two regular expressions; if they succeed, we strip that prefix and keep the remainder of the db name (including periods!). If they fail, we keep the entire db name (including periods). That would work.

Do we want to carve an additional reserved namespace, as per the discussion above? I would like to ensure it's precise enough that we don't need to greedily absorb more than 1-2 periods at the beginning of the db space.

jeffswenson commented 1 year ago

How does the proxy know what "the cluster's DNS zone" is for rule (1)? Is there a parameter? I can't see this in the proxy source code.

It doesn't know this today. We would need to provide the proxy with a list of DNS suffixes that enable SNI. The SNI enable list would contain all DNS suffixes that may be routed to the cluster.

Alternatively, we could supply a list of DNS suffixes that disable SNI. We would use the disable list to ignore SNI strings that contain the old "freetier" dns names.

serverless-\d+.., which is the new format suggested by the RFC (and this still needs a precise regular expression for the routing ID, as per my question above). This one is actually pretty good, because it has both a prefix (serverless-) and a fixed "envelope" (just two periods).

We ended up dropping the Serverless prefix and replaced the prefix with the cluster's name. Example oiled-husky-37.6s4p. The format of this string is <cluster-name>-<tenant-id>.<host-cluster-id>.

Do we want to carve an additional reserved namespace, as per the discussion above? I would like to ensure it's precise enough that we don't need to greedily absorb more than 1-2 periods at the beginning of the db space.

If we introduce namespace now, we will still need to support thousands of production clusters that do not use the new namespace. I see two parts of this we need to be look out for:

Parsing ambiguity

Looking at the string "foo-bar-12.abcd.bannana". It is unclear if it should be parsed as:

db="bannana" host_cluster="abcd" cluster_name="foo-bar" tenant_id="12"
db="abcd.bannana" cluster_name="foo-bar" tenant_id="12"

@andy-kimball I'm skeptical anyone knows that they can route using the routing id. All of the documentation I can find refers to the -. format. I think we should check if anyone is using. If it is unused we should remove support from the sqlproxy for parsing a routing id from the database name. The format is fundamentally ambiguous.

If we remove the extra '.' in the routing ID, then the cluster and database names are unambiguous. The segment up to the first '.' is the cluster name and the remainder is the database name.

We could adjust the control plane to avoid the creation of non-unique (cluster-name, tenant-id) pairs. Name conflicts are unlikely, and we can avoid them by incrementing the tenant id until a non-conflicting id is found. For the small handful of tenants that exist with ambiguous names, we can avoid migrating them to the same host cluster, or if it looks like the cluster is unused, we could rename the cluster's tenant ID on migration.

Error Messages

If we see the string "foo-12.bar" and tenant "foo-12" does not exist, we don't know which error should be returned to the client. It may be they forgot the tenant and "foo-12.bar" is the database name, or it may be they mistyped the tenant and foo-12 does not exist. I think we can handle this scenario by including the parsed representation in the error message. E.g. return an error like: 'Tenant "foo-12" does not exist. No SNI or cluster option was found. Parsed the tenant from the database parameter "foo-12.bar"'

jeffswenson commented 1 year ago

It looks like parsing <cluster-name>-<tenant-id>.<host-cluster>.<database> was never implemented: https://github.com/cockroachdb/cockroach/blob/4b3f5822219a5b0155725ef2ede3ec33f46dfc48/pkg/ccl/sqlproxyccl/proxy_handler.go#L765. So we don't even need to worry about breaking anyone.

knz commented 1 year ago

was never implemented:

The way I understand it is that the RFC proposal was accepted and the work to modify sqlproxy is currently queued. Are you suggesting the discussion at hand will preempt that work?

jaylim-crl commented 1 year ago

So we don't even need to worry about breaking anyone.

Yes, that was never implemented. Today, we can only route by<cluster-name>-<tenant-id>.<database>. If we do implement the RFC proposal + tenant migration, we need to make that change in sqlproxy.

andy-kimball commented 1 year ago

If we remove the extra '.' in the routing ID, then the cluster and database names are unambiguous. The segment up to the first '.' is the cluster name and the remainder is the database name.

It's ambiguous just with a single period. Because the tenant name is optional in the database name, there's no way to tell whether foo-100.bar is a cluster name followed by database name, or just a database name with a period.

Therefore, we're already in a place where there's some ambiguity, and need to continue supporting existing connection strings. However, as @JeffSwenson said, we haven't implemented parsing of the <cluster-name>-<tenant-id>.<host-cluster>.<database> format yet, so we don't have backwards-compatibility concerns there.

I'm very concerned about the aesthetics of the various proposals (I think that's just as important as unambiguity), but as Jeff said, embedding the tenant name in a database name is a "fallback for a fallback", so aesthetics are less of a concern there. I'd be OK with a format like:

crdb:dim-dog-100.5xj.my.db

If the database name starts with crdb:, then we'd expect that to be followed by a routing ID. Any parts beyond that would be the database name. We might also consider using cluster: as the prefix, to match the --cluster option.

Here are a couple more answers:

The current logic uses an option named --cluster, but the RFC refers to --routing

Today, it's --cluster, but I was proposing we accept --routing as well. I'm backing off that now, though - I think --cluster is fine, and we should just not bother accepting --routing.

serverless-\d+.., which is the new format suggested by the RFC

The RFC suggested we move to a constant serverless string, not in order to avoid ambiguity, but b/c we wanted to support customers renaming their Serverless clusters. However, later I realized that we could still support renaming even if we use the cluster name in the connection string, by temporarily routing the old name and the new name to the same place. It's much friendlier to embed the cluster name in the connection string, and it also gives us a small security boost, since bad actors need to know the cluster name in order to "wake up" a cluster. So, when we actually implemented the RFC, we decided to use the cluster name rather than serverless.

As it stands, your standpoint so far has been "everything up to the last period in the database name is reserved as routing information" and that is unacceptable. We need something better.

Note this is exactly how we parse qualified names in SQL - you need to use quotes to disambiguate when you use dots in names, e.g.:

CREATE TABLE myschema."my.table"

In this case, everything after the last period (outside of quotes) is the table name, and not using quotes will cause ambiguities. I think it was perfectly reasonable to use similar rules when we decided to qualify database names in PG connection strings. Using a prefix to disambiguate rather than using SQL qualified name rules is just your personal preference and not the only "acceptable" solution.

knz commented 1 year ago

this is exactly how we parse qualified names in SQL - you need to use quotes to disambiguate when you use dots in names

"This is exactly how" is factually incorrect. The SQL dialect has the benefit of opt-in quotes to disambiguate. The connection URL does not have that luxury.

I'll reproduce jeff and i's argument in my next comment.

andy-kimball commented 1 year ago

The connection URL does not have that luxury.

We chose to extend what's acceptable in the connection URL. We have no requirement to be exactly compatible there in every edge case. I think your underlying assumption is incorrect - that we have to exactly replicate the semantics of PG connection strings in our Serverless offering. I disagree that this is one of our requirements.

knz commented 1 year ago

Here's what Jeff and I discussed

Requirements on the shape of the solution

all of us here agree we want to preserve compatibility with existing serverless deployments, including connection strings embedded in client apps. All points below should be evaluated with this requirement in mind.
in addition to the above, we're adding the requirement that there must always be some way for a user to connect to a database containing periods in its name, since those db names can be created in SQL ("if one can create a db with a given name, then one must also be able to connect to it").
we're also adding the requirement that we should reserve today a mechanism for future extensions to our routing protocol, in a way that will guarantee 1) we don't require currently developed client apps to change their URLs later 2) we don't create a situation that impede our other requirements in this list.
we also would like to limit the number of documentation differences between serverless and dedicated/SH, including how connection strings are constructed in the multi-tenant case (#92580). We should not aim for completely divergent mechanisms in SH/dedicated.
in the dedicated/SH multi-tenant case (NOT serverless), we want to:
- support routing to a "default" tenant, i.e. it must be possible to connect without routing details, at least to provide backward-compatibility with apps developed for previous versions.
- support routing based on the name alone, without the ID, so that we can use tenant renaming to redirect client app traffic from one tenant to another (this will come handy in C2C replication failover).
in the SH case (not CC dedicated/serverless), we will continue to support customers who cannot customize their DNS to use SNI. For those, a non-SNI routing method must continue to be available.

Some useful properties we can take advantage of

in CC serverless, regarding the ergonomics of routing:
- we already planned to make SNI the main way to provide routing information. The number of client drivers that don't support SNI was already small in 2016 and we're expecting they will die out.
- for all non-SNI cases, the --cluster option is our main documented fallback.
- A database-based routing is a fallback of a fallback. We only need to consider the database name for routing if neither SNI nor cluster option is present in the URL.
regarding cluster names:
- in CC serverless, cluster names match the regexp ^[a-z0-9][a-z0-9-]{4,98}[a-z0-9]$, i.e. they can't contain a period and never start or end with a hyphen, with a min of 6 characters and a max of 100.
- in SH/dedicated, as per #93269, tenant names match a similar regexp ^[a-z0-9]([a-z0-9-]{0,98}[a-z0-9])?$ (i.e. same overall structure, but minimum of 1 character instead of 6)
regarding routing to the right host cluster:
- we expect the main mechanism for host cluster routing to be DNS (different host clusters will have different proxy instances)
- in the future use case where a tenant is migrated from one host cluster to another, the SQL proxy will use the tenant directory to resolve the the <cluster_name>-<tenant_id>.<original_host_id> to a specific tenant cluster within the host. The tenant directory stores the mapping in the k8s api server and the api server state is maintained by the Cockroach Cloud control plane.
- no current serverless user embeds a routing ID in their connection URLs so as of this writing we still have flexibility on where/how we support introducing the routing ID in a URL, should we want/need to support a mechanism other than SNI.

Proposal

In serverless proxy, we would change the routing logic as follows:

First look at provided pg options.
- If --cluster is present, use that. Expect it to match the structure <clustername>-<tenantid>(\.<routingID>)?. Routing ID is optional in that case. Its separator is . which can never occur inside a cluster name. Don't look at anything else. The client-provided database name is preserved as-is.
  
  Note: the specific syntax to embed the routing ID in --cluster does not need to be finalized at this time. We can just say that --cluster has way more extension points than the db name because there is no prior use.
- Otherwise, fall through.
Then look at provided SNI hostname. Parse it with the regexp <clustername>-<tenantid>(\.<routingID>)?
- If the regexp matches, use that for routing and stop here. The client-provided database name is also preserved as-is.
- Otherwise, fall through. This will incidentally also include connections to free-tier.gcp.xxx (that don't also include a --cluster option) because free tier URLs do not contain a number after the free-tier prefix.
Then (fall back without option nor SNI) consider the database name. At this point, because this is a fallthrough case, we need routing details. In that case:
1. Extract everything up to the first period in the database name. Strip that prefix - the remainder (including any additional period) will remain as client-provided database name.
2. Inside that extracted prefix, inspect the structure:
  - If it matches the regexp for <clustername>-<tenantid>, use that (this is the free tier compatibility case).
  - for now refuse anything else with an error. We could (but choose not to, for now) extend this to support the routing ID in there, as detailed in the appendix later below.

Examples for serverless:

URL	Resulting Configuration	Notes
`postgres://10.20.30.40/foo.bar?options=--cluster=sometenant-100`	tenant=`sometenant-100` dbname=`foo.bar`	Cluster option only
`postgres://10.20.30.40/foo.bar?options=--cluster=sometenant-100.xyz`	tenant=`sometenant-100` host=`xyz` dbname=`foo.bar`	Cluster option only
`postgres://10.20.30.40/blah-100.baz?options=--cluster=sometenant-100`	tenant=`sometenant-100` dbname=`blah-100.baz`	Cluster option prevails over db name
`postgres://free-tier.gcp/blah-100.baz?options=--cluster=sometenant-100`	tenant=`sometenant-100` dbname=`blah-100.baz`	Cluster option prevails over db name
`postgres://free-tier.gcp/blah-100.baz?options=--cluster=sometenant-100.xyz`	tenant=`sometenant-100` host=`xyz` dbname=`blah-100.baz`	Cluster option prevails over db name
`postgres://mytenant-100.gcp.xxx/blah-100.baz?options=--cluster=othertenant-100`	tenant=`othertenant-100` dbname=`blah-100.baz`	Cluster option prevails over SNI
`postgres://mytenant-100.gcp.xxx/blah-100.baz`	tenant=`mytenant-100` dbname=`blah-100.baz`	SNI prevails over db name
`postgres://mytenant-100.xyz.gcp.xxx/blah-100.baz`	tenant=`mytenant-100` host=`xyz` dbname=`blah-100.baz`	SNI prevails over db name
`postgres://free-tier.gcp.xxx/mytenant-100.blah-100.baz`	tenant=`mytenant-100` dbname=`blah-100.baz`	Unambiguous mandatory routing prefix in db name
`postgres://free-tier.gcp.xxx/mytenant-100`	tenant=`mytenant-100` dbname=`defaultdb`	Unambiguous mandatory routing prefix in db name

In the dedicated/SH multitenant case, we would implement the routing logic as follows:

First look at provided pg options.
- If --cluster is present, use that. Expect it to match the structure <clustername>. Don't look at anything else. Client-provided dbname is unchanged.
- Otherwise, fall through.
Then, check the (NEW) server.pre-serve.sni-routing.suffix cluster setting.
- If empty, fall through to point 3 below.
- Otherwise, take its value and strip it from the right side of the SNI hostname. (There may be multiple names in the setting, to support DNS renames.) Then inspect the rest (the prefix) using a regexp for <clustername>. If it matches, use that. Then don't look at anything else. Client-provided dbname is unchanged.
- Otherwise (server.pre-serve.sni-routing.suffix is non-empty but does not match), look at (NEW) server.pre-serve.sni-routing.fallback.enabled. If disabled, stop here with an error. Otherwise, fall through.
Then (fall back without option nor SNI) consider the database name field.
- If it starts with cluster:<clustername>/, strip that prefix and use the cluster name enclosed therein.
- Otherwise, use the default tenant, and keep the db name field as-is (including any and all periods).

To support the last rule above, we would also modify the SQL dialect to reject the creation of any DB with a name that matches ^cluster:[^/]*(/$)

We find this restriction to be much less likely to be backward-incompatible with existing single-tenant deployments than blocking the entire <clustername>\. prefix namespace in db names.

This will also provide us a mechanism for extension, just like the separation using / in --cluster.

Examples for dedicated/SH:

URL	Resulting Configuration	Notes
`postgres://10.20.30.40/foo.bar?options=--cluster=sometenant-100`	tenant=`sometenant-100` dbname=`foo.bar`	Cluster option only
`postgres://10.20.30.40/foo.bar?options=--cluster=sometenant-100.xyz`	connection error (routing ID not supported)	Cluster option only
`postgres://10.20.30.40/blah-100.baz?options=--cluster=sometenant-100`	tenant=`sometenant-100` dbname=`blah-100.baz`	Cluster option prevails over db name
`postgres://mytenant-100.gcp.xxx/blah-100.baz?options=--cluster=othertenant-100`	tenant=`othertenant-100` dbname=`blah-100.baz`	Cluster option prevails over SNI
`postgres://mytenant-100.gcp.xxx/blah-100.baz`	tenant=`mytenant-100` dbname=`blah-100.baz`	SNI prevails over db name (if `server.pre-serve.sni-routing.suffix` is `gcp.xxx`)
`postgres://mytenant-100.xyz.gcp.xxx/blah-100.baz`	tenant=`mytenant-100` dbname=`blah-100.baz`	SNI prevails over db name (if `server.pre-serve.sni-routing.suffix` is `xyz.gcp.xxx`, otherwise error or fallback)
`postgres://someaddress/cluster:mytenant-100/blah-100.baz`	tenant=`mytenant-100` dbname=`blah-100.baz`	Unambiguous presence of routing prefix in db name.
`postgres://someaddress/cluster:mytenant-100`	tenant=`mytenant-100` dbname=`defaultdb`	Unambiguous presence of routing prefix in db name.
`postgres://someaddress/blah-100.baz`	tenant=`<default>` dbname=`blah-100.baz`	Unambiguous absence of routing prefix in db name.

Appendix

Serverless-only, optional host routing ID in database name.

How to find a host routing ID in the fallback of the fallback in the database name? Remember in that case (we're in the fallback) the routing info must be present so we don't have to support the case where it's not included.

In that case, separate with additional punctuation. This is possible because pg db names can contain extra punctuation. Some candidates:

postgres://../cluster:<routingid>:<clustername>-<tenantid>/<dbname> (compatible with the proposal above))
postgres://../<routingid>:<clustername>-<tenantid>.<dbname> (using just a colon)
postgres://../<clustername>-<tenantid>:<routingid>.<dbname>

andy-kimball commented 1 year ago

Overall, this proposal would be acceptable to me. Can you put it in a Google doc, so that it's easy to make comments on specific details?

knz commented 1 year ago

Sure here you are https://docs.google.com/document/d/1RPLnZBcj4Q_jmi85CBwOZcpCjXFojnnnYk0_Y24Zn94/edit

knz commented 1 year ago

PR with finalized RFC #93537

cockroachdb / cockroach