opt: support option to error instead of visiting remote regions when querying multi-region database

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

29.99k stars 3.79k forks source link

opt: support option to error instead of visiting remote regions when querying multi-region database #83819

Closed rytaft closed 1 year ago

rytaft commented 2 years ago

Today, the optimizer will make some effort to create a query plan that avoids visiting remote regions if possible. However, there is no guarantee that any query will not visit a remote region. For some latency-sensitive applications, this may not be acceptable. In these cases, it would often be better to return an error rather than incur the high latency of a cross-region hop.

We should create a session setting that causes the optimizer and/or execution engine to error if it tries to visit a remote region. We should try to make the error message as friendly as possible to help the developer fix their query or data so that it does not visit a remote region. For example, if a query cannot use locality optimized search, we could include a hint suggesting that they add a limit to the query (so it can use LOS) or include crdb_region in the WHERE clause. If a query does use locality optimized search but the data is not found locally, we can provide a hint to that effect as well.

This will help provide guard rails for latency-sensitive apps that cannot tolerate cross-region latency.

Jira issue: CRDB-17283 Epic: CRDB-18645

rytaft commented 2 years ago

cc @andy-kimball @vy-ton @awoods187 let me know if I missed anything in the above description.

andy-kimball commented 2 years ago

There's one other important requirement for a "regional mode": when we error, we want to include the correct home region in which the query should be run. Then the app knows where to redirect the operation. However, that only works when the query has a single home region. The optimizer should statically prove that the query can be run in a single home region, and then dynamically be able to determine what that home region should be. If it cannot prove that the query can be run in a single home region, it should return a compile-time error, letting the user know they can only run this query in "global mode". Then, at run-time, the execution engine should return a "query can only be run in home region 'us-west2'" error if the query is run in the wrong region.

Separately, is there another issue already filed for the ability to incorporate IN SAME REGION FK constraints into LOS?

rytaft commented 2 years ago

There's one other important requirement for a "regional mode": when we error, we want to include the correct home region in which the query should be run.

Hmm this seems like it would be tricky in most cases. I believe that the only way we can statically determine the correct region is if (1) crdb_region is specified in the predicate, or (2) crdb_region is a computed column and the determinant column(s) are specified in the predicate. But in all other cases I don't think a redirect would be possible. For example, if we are using locality optimized search, we can only determine the home region by running the query to completion to find where the data is located.

Separately, is there another issue already filed for the ability to incorporate IN SAME REGION FK constraints into LOS?

Not sure I understand what this issue would be. I think LOS should already work correctly with IN SAME REGION.

andy-kimball commented 2 years ago

the only way we can statically determine the correct region

We only need to statically prove that there is a single home region. We would dynamically determine what that home region is. Here's a simple example:

SELECT * FROM customers WHERE cust_id = $1

By noticing that customers is an RBR table (with stale replicas), and that cust_id is a key column, we can statically prove that the query can be fulfilled from a single home region (where that customer is homed). Furthermore, we can statically determine that the value of $1 is the "homing key" that can be used to lookup the home region using a query like this (and without leaving the current region, due to the stale replicas being present):

SELECT crdb_region FROM customers WHERE cust_id = $1 AS OF SYSTEM TIME follower_read_timestamp()

Of course, if only really simple queries like this one could be statically proven, there's not a lot of value. But that's where IN SAME REGION comes in. As long as we know linked rows from other tables are in the same region, then we can also prove that the query has a single home region, e.g.:

SELECT * FROM customers NATURAL JOIN orders WHERE orders.cust_id = $1

This query is provably confined to a single co-located row grouping (not sure of the term we want to use here, I've heard micro-shard, granule, FK shard). The optimizer can follow the co-location FK links back up to the root "homing table", and ensure that at most one row is selected from that table (it's a partial generalization of the LOS logic). Queries like this are the common case, and I believe would allow 90% of queries that real apps would need to run (and expect to execute in a single home region).

Regarding LOS, today we fall back on searching all regions; I'm proposing that we can do much better when we can statically prove co-location:

In "global mode", it can fall back to a single home region when that can be determined (I think a very common case).
In "regional mode", it can error, but the error can return the name of the single home region, allowing the app to redirect to that region and run close to the data.

rytaft commented 2 years ago

I see, using AS OF SYSTEM TIME follower_read_timestamp() to get the value of crdb_region is a good idea. That makes sense to me.

I do think the optimizer is already smart enough to do almost everything else you are describing, even without IN SAME REGION (we support LOS for lookup joins too). However, fixing https://github.com/cockroachdb/cockroach/issues/69617 is probably needed to get us the rest of the way there.

andy-kimball commented 2 years ago

even without IN SAME REGION (we support LOS for lookup joins too)

Wouldn't there need to be a FK from orders(crdb_region, cust_id) => customers(crdb_region, cust_id) in order for this to work? If there was only a FK from orders(cust_id) => customers(cust_id), then we'd have to search every region, b/c the customer's orders might be anywhere. That's all IN SAME REGION is - just sugar syntax for that FK + cascading.

rytaft commented 2 years ago

Ah, yea you're right -- we can't use LOS in this case since cust_id would not be unique in the orders table (although we might be able to support it in the future if the user provides a LIMIT). The FK should make it possible to stay in the same region. To enforce that the optimizer chooses the correct plan, we probably also need to finish the work I started to use the distribution property in the coster.

Currently, we choose a merge join for this plan which visits every region:

CREATE TABLE customers (
    id     UUID   PRIMARY KEY DEFAULT gen_random_uuid(),
    name   STRING NOT NULL
) LOCALITY REGIONAL BY ROW;

CREATE TABLE orders (
    id      UUID   PRIMARY KEY DEFAULT gen_random_uuid(),
    cust_id UUID   NOT NULL,
    items   STRING NOT NULL,
    INDEX (cust_id),
    FOREIGN KEY (cust_id, crdb_region) REFERENCES customers (id, crdb_region) ON UPDATE CASCADE
) LOCALITY REGIONAL BY ROW;

-- ... insert some data ...

demo@127.0.0.1:26257/db> EXPLAIN SELECT * FROM customers c JOIN orders o ON c.id = o.cust_id AND (c.crdb_region = o.crdb_region) WHERE c.id = '69a1c2c2-5b18-459e-94d2-079dc53a4dd0'; 
                                                                                                                                                                           info
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • merge join
  │ estimated row count: 10
  │ equality: (crdb_region, cust_id) = (crdb_region, id)
  │ right cols are key
  │
  ├── • index join
  │   │ estimated row count: 10
  │   │ table: orders@orders_pkey
  │   │
  │   └── • scan
  │         estimated row count: 10 (0.10% of the table; stats collected 7 minutes ago)
  │         table: orders@orders_cust_id_idx
  │         spans: [/'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0'] [/'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0'] [/'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0']
  │
  └── • union all
      │ estimated row count: 1
      │ limit: 1
      │
      ├── • scan
      │     estimated row count: 1 (0.10% of the table; stats collected 7 minutes ago)
      │     table: customers@customers_pkey
      │     spans: [/'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0']
      │
      └── • scan
            estimated row count: 1 (0.10% of the table; stats collected 7 minutes ago)
            table: customers@customers_pkey
            spans: [/'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0'] [/'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0']
(30 rows)

but if I force a lookup join, we choose the correct plan:

demo@127.0.0.1:26257/db> EXPLAIN SELECT * FROM customers c INNER LOOKUP JOIN orders o ON c.id = o.cust_id AND (c.crdb_region = o.crdb_region) WHERE c.id = '69a1c2c2-5b18-459e-94d2-079dc53a4dd0';
                                                                                                                        info
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ estimated row count: 10
  │ table: orders@orders_pkey
  │ equality: (crdb_region, id) = (crdb_region,id)
  │ equality cols are key
  │
  └── • lookup join
      │ estimated row count: 10
      │ table: orders@orders_cust_id_idx
      │ equality: (crdb_region, id) = (crdb_region,cust_id)
      │ pred: cust_id = '69a1c2c2-5b18-459e-94d2-079dc53a4dd0'
      │
      └── • union all
          │ estimated row count: 1
          │ limit: 1
          │
          ├── • scan
          │     estimated row count: 1 (0.10% of the table; stats collected 4 minutes ago)
          │     table: customers@customers_pkey
          │     spans: [/'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-east1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0']
          │
          └── • scan
                estimated row count: 1 (0.10% of the table; stats collected 4 minutes ago)
                table: customers@customers_pkey
                spans: [/'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'europe-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0'] [/'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0' - /'us-west1'/'69a1c2c2-5b18-459e-94d2-079dc53a4dd0']
(28 rows)

If we update the coster to use the distribution property, I think we should be able to get the second plan.

rytaft commented 2 years ago

@andy-kimball and I just chatted and agreed that this work can probably be broken into 3 phases that would each provide some incremental value:

Create a mode where we return an error if we cannot prove that the query can run in a single region. This can be supported entirely in the optimizer, and would require giving the "distribute" operator a very large cost (similar to other hints where we give non-compliant plans a "hugeCost"). The execbuilder would then return an error if the chosen plan has a distribute operator. Note that in this first phase, we do not error in the execution engine, so if LOS fails to find a row locally, we may still visit remote regions.
Update the above mode (or perhaps create a separate mode) so that we do error in the execution engine if LOS fails to find rows locally. Users will need to manually run a query such as SELECT crdb_region FROM customers WHERE cust_id = $1 AS OF SYSTEM TIME follower_read_timestamp() to find the correct region in which to run the query.
In cases where it should be possible to run the query in a single region, return the correct region to the user so they can automatically redirect their application to run there.

rytaft commented 2 years ago

Another thing to add here is that, initially, this mode where we return an error if a query attempts to visit remote regions will only be supported for databases with ZONE survivability. Later on, we can support a mode for databases with REGION survivability where we return an error if a query attempts to visit a region outside of the local super region.

msirek commented 2 years ago

Just to clarify the behavior of the new mode, the summary says:

error instead of visiting remote regions when querying multi-region database

What if a query involves some tables in a multi-region database and some tables not defined in a multi-region database? Should that error out too? Or, if none of the tables are in a multi-region database? Maybe it doesn't matter if a table belongs to a multi-region database, just whether it's an RBR table or not. So if any accessed table is not RBR, or not a global table, then we immediately error out?

andy-kimball commented 2 years ago

What if a query involves some tables in a multi-region database and some tables not defined in a multi-region database?

There are 2 different errors:

"This query does not have a home region." This is a permanent error that will always occur regardless of which region the query runs in. This means the optimizer cannot determine a single home region where the query can run.
"This query must be run in home region 'us-west2'." This error is reported when the optimizer can determine a home region, but the current region is not that region.

These rules imply the answer to your question. Say the query touches two different databases. It references a global table in the MR database and an RBT table in the SR database. In theory, this query can run in the SR database's region, as long as the MR database is also in that region (because consistent replicas would be guaranteed to exist in that home region for both tables). In practice, however, we might just always report error #1 if we find a cross-database query. It's such a rare edge case that I don't think it's important to support it.

Maybe it doesn't matter if a table belongs to a multi-region database, just whether it's an RBR table or not.

This idea applies to RBT tables as well as RBR/global tables. If an RBT table is referenced outside its home region (i.e. the primary region for a MR database), then this mode should raise an error containing the RBT table's home region, so the app can retry there.

ajwerner commented 2 years ago

Update the above mode (or perhaps create a separate mode) so that we do error in the execution engine if LOS fails to find rows locally. Users will need to manually run a query such as SELECT crdb_region FROM customers WHERE cust_id = $1 AS OF SYSTEM TIME follower_read_timestamp() to find the correct region in which to run the query.

Did you consider AS OF SYSTEM TIME with_max_staleness(...) instead of follower_read_timestamp()? That would have both better freshness and better availability.

andy-kimball commented 2 years ago

Did you consider AS OF SYSTEM TIME with_max_staleness(...)

My understanding is that this only works for point queries. We need something that always works.

ajwerner commented 2 years ago

Correct. Ack. I was focusing too much on the individual example query.

andy-kimball commented 2 years ago

Actually, I was thinking about supporting stale reads in general. You're right that we could use with_max_staleness for the specific case of looking up the home region. That should always be a single row.

msirek commented 2 years ago

By noticing that customers is an RBR table (with stale replicas), and that cust_id is a key column, we can statically prove...

When would an RBR table not have stale replicas? Or are you just saying the RBR table will always have stale replicas to read from? I'm assuming it's OK that the stale replica is just the nearest replica, and not necessarily in the gateway region. In other words, I'm trying to figure out if we need to prove that the nearest stale replica is in the gateway region before issuing the lookup query: SELECT crdb_region FROM customers WHERE cust_id = $1 AS OF SYSTEM TIME follower_read_timestamp(). Maybe there's no easy way to tell in the SQL layer where the nearest stale replica is located?

Edit: I guess since it's in the error case that we want to run this query, there's no need to guarantee local stale replicas.

@rytaft , I noticed the note:

Note that in this first phase, we do not error in the execution engine, so if LOS fails to find a row locally, we may still visit remote regions.

Is this the scope I should be pursuing, only erroring out during planning?

andy-kimball commented 2 years ago

When would an RBR table not have stale replicas?

This question is answered in this document, which you should definitely read if you have not already, since it sets up all the context for this work. The TL;DR is that we'll add syntax to specify that a particular RBR table should not use stale replicas.

msirek commented 2 years ago

would require giving the "distribute" operator a very large cost (similar to other hints where we give non-compliant plans a "hugeCost")

@rytaft Currently, checking support for a distributed plan is done in function checkSupportForPlanNode, when dispatching the plan. Are you suggesting doing something like mapping these rules that check nodes like scanNode to do something similar during planning so that if the corresponding memo expression (e.g. ScanExpr) supports distribution, that we give it a cost of hugeCost? Maybe I misunderstood what you meant by "distribute" operator. I was thinking maybe we'd want to just detect when we have a regional table which could be executed locally, and add hugeCost in non-local cases, for example: https://gist.github.com/msirek/23835c40ac4d51d07cdff754d60df750 https://gist.github.com/msirek/43e40e010eeb80b983db96892e22b7e6

Would we want to force plans to be local (not distributed) when the session flag is set (or a flag in the plan is set) instead of erroring out distributed execution?

Also, I don't quite understand the reason behind:

this mode where we return an error if a query attempts to visit remote regions will only be supported for databases with ZONE survivability

I can definitely add checks in the code for this, but am curious why we shouldn't error out with REGION survivability. Does this mean if a table is any type of regional table in a database with REGION survivability, scanning from that table in the new mode should never error out, even if we access remote regions? Also, is this referring only to the execution-time checks in phase 2 of the feature, or also for the phase 1 compile-time checks?

For queries that use bounded staleness (func (b *Builder) boundedStaleness() bool), I'm wondering if this condition is sufficient to access only rows in a local replica (and therefore avoid erroring out). I'll look into this a bit, but if someone knows, please let me know. Thanks.

rytaft commented 2 years ago

Maybe I misunderstood what you meant by "distribute" operator.

I'm referring to this: https://github.com/cockroachdb/cockroach/blob/687beb1050a668d550129a29bc11dee86bda146b/pkg/sql/opt/ops/enforcer.opt#L29-L39

Would we want to force plans to be local (not distributed) when the session flag is set (or a flag in the plan is set) instead of erroring out distributed execution?

I don't think so -- this wouldn't actually solve the problem because although the SQL layer is local, the KV api would still need to visit remote nodes. We'd still have non-local latency.

am curious why we shouldn't error out with REGION survivability.

We can never guarantee local latency with REGION survivability because (1) writes must always visit at least one remote region, and (2) reads will visit a remote region if the home region is unavailable.

For queries that use bounded staleness (func (b *Builder) boundedStaleness() bool), I'm wondering if this condition is sufficient to access only rows in a local replica (and therefore avoid erroring out).

This is usually sufficient, but not guaranteed. There are cases where we would need to visit the leaseholder for conflict resolution.

msirek commented 2 years ago

We can never guarantee local latency with REGION survivability because...

OK, I take this to mean we should always error out when scanning these tables.

Would we want to force plans to be local (not distributed) when the session flag is set (or a flag in the plan is set) instead of erroring out distributed execution?
I don't think so -- this wouldn't actually solve the problem because although the SQL layer is local, the KV api would still need to visit remote nodes. We'd still have non-local latency.

I meant in the case where the scans could be local (e.g. SELECT WHERE crdb_region = xxx), but for whatever reason we choose to run the query on all nodes (perhaps due to a subsequent operation, like an aggregation), then force the query to run locally. Maybe this could never happen though.

    For queries that use bounded staleness (func (b *Builder) boundedStaleness() bool), I'm wondering if this condition is sufficient to access only rows in a local replica (and therefore avoid erroring out).
This is usually sufficient, but not guaranteed. There are cases where we would need to visit the leaseholder for conflict resolution.

OK, I think maybe the guaranteed case is instead follower_read_timestamp()... something which deals only with timestamps older than the current closed timestamp. But even in this case I guess we need to check for ZONE survivability. I assume this also works for non-regional tables in a multiregion database with ZONE survivability. Assuming we want to handle this case, the execution planner may choose distributed execution (especially if there are no scan constraints). So, would we want to error out the distributed execution, or force it to local? If the table were large, forcing to local, say on an aggregation, may be slow. But if the user is setting the session setting, maybe we can assume they know what they're doing and are only submitting OLTP-style queries (or scans against small tables). Or, we could choose not to identify follower_read_timestamp() queries as local latency.

rytaft commented 2 years ago

@msirek and I discussed these questions offline. I think we're on the same page now, but let me know if there are other questions.

msirek commented 2 years ago

It's unclear if erroring out remote mutations is supposed to be part of phase 1. Since doing this in execution-time checks would be a more complete solution, I'm not going to include it as part of phase 1, unless someone has a different opinion.

msirek commented 2 years ago

@andy-kimball Phase 1 requirements include:

initially, this mode where we return an error if a query attempts to visit remote regions will only be supported for databases with ZONE survivability.

Would this mean we never error out reading from RBR, RBT and Global tables (and document this feature limitation), or that we should always error out these tables?

rytaft commented 2 years ago

I've split phase 1 of this issue into https://github.com/cockroachdb/cockroach/issues/86228, so we can close that issue when https://github.com/cockroachdb/cockroach/pull/85704 merges.

msirek commented 1 year ago

Should this issue be trying to control how full scans are done? For example:

CREATE DATABASE db PRIMARY REGION "us" REGIONS "ca", "eu";
create table t1 (a int, b int, c int not null, primary key(a)) locality regional by row;
create table t2 (a int, b int, c int not null, primary key(a)) locality regional by row;

explain select * from t1, t2 where t1.a=t2.a limit 1;
                                    info
----------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • limit
  │ count: 1
  │
  └── • lookup join
      │ table: t1@t1_pkey
      │ equality cols are key
      │ lookup condition: (crdb_region = 'us') AND (a = a)
      │ remote lookup condition: (crdb_region IN ('ca', 'eu')) AND (a = a)
      │
      └── • scan
            missing stats
            table: t2@t2_pkey
            spans: FULL SCAN (SOFT LIMIT)

Should we be trying to scan t2 ranges which are local first? The scan on t2 could be a union going after crdb_region = 'us' rows first, with a soft limit. A full scan on t2 I assume should be statically errored out.

Also, I assume this issue isn't handling any work on PLACEMENT RESTRICTED or IN SAME REGION. Or is it?

msirek commented 1 year ago

It looks like some more work is needed on to make sure locality-optimized join avoids reading into remote regions when not necessary. Joining two REGIONAL BY ROW tables with LIMIT 1 doesn't seem to read only local rows. Is the problem that we rely on soft limits to not read into remote regions in the first batch?

cockroach demo --multitenant=false --insecure --http-port=8086 --global --nodes=9 --no-example-database

CREATE DATABASE db PRIMARY REGION "us-east1" REGIONS "us-west1", "europe-west1";
use db;
create table t1 (a int, b int, c int not null, primary key(a)) locality regional by row;
create table t2 (a int, b int, c int not null, primary key(a)) locality regional by row;

insert into t1 (crdb_region, a, b, c) values ('us-east1', 1, 1, 1);
insert into t1 (crdb_region, a, b, c) values ('europe-west1', 2, 1, 1);
insert into t2 (crdb_region, a, b, c) values ('us-east1', 1, 1, 1);
insert into t2 (crdb_region, a, b, c) values ('europe-west1', 2, 1, 1);
explain select * from t1, t2 where t1.a=t2.a limit 1;
                                            info
--------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • limit
  │ count: 1
  │
  └── • lookup join
      │ table: t2@t2_pkey
      │ equality cols are key
      │ lookup condition: (crdb_region = 'us-east1') AND (a = a)
      │ remote lookup condition: (crdb_region IN ('europe-west1', 'us-west1')) AND (a = a)
      │
      └── • scan
            missing stats
            table: t1@t1_pkey
            spans: FULL SCAN (SOFT LIMIT)

explain analyze (debug) select * from t1, t2 where t1.a=t2.a limit 1;
-- plan.txt:
planning time: 2ms
execution time: 293ms
distribution: full
vectorized: true
rows read from KV: 4 (108 B, 8 gRPC calls)
cumulative time spent in KV: 231ms
maximum memory usage: 110 KiB
network usage: 864 B (6 messages)
regions: europe-west1, us-east1, us-west1

• limit
│ columns: (a int, b int, c int, a int, b int, c int)
│ count: (1)[int]
│
└── • project
    │ columns: (a int, b int, c int, a int, b int, c int)
    │
    └── • lookup join (inner)
        │ columns: (a int, b int, c int, a int, b int, c int, crdb_region crdb_internal_region)
        │ nodes: n2, n3, n6, n7
        │ regions: europe-west1, us-east1, us-west1
        │ actual row count: 2
        │ vectorized batch count: 2
        │ KV time: 220ms
        │ KV contention time: 0µs
        │ KV rows read: 2
        │ KV bytes read: 36 B
        │ KV gRPC calls: 4
        │ estimated max memory allocated: 90 KiB
        │ MVCC step count (ext/int): 0/0
        │ MVCC seek count (ext/int): 6/6
        │ estimated row count: 1,000 (missing stats)
        │ table: t2@t2_pkey
        │ equality cols are key
        │ lookup condition: ((((crdb_region)[crdb_internal_region] = ('us-east1')[crdb_internal_region])[bool]) AND (((a)[int] = (a)[int])[bool]))[bool]
        │ remote lookup condition: ((((crdb_region)[crdb_internal_region] IN ((('europe-west1')[crdb_internal_region], ('us-west1')[crdb_internal_region]))[tuple{crdb_internal_region, crdb_internal_region}])[bool]) AND (((a)[int] = (a)[int])[bool]))[bool]
        │
        └── • scan
              columns: (a int, b int, c int)
              nodes: n2, n3, n6, n7
              regions: europe-west1, us-east1, us-west1
              actual row count: 2
              vectorized batch count: 2
              KV time: 10ms
              KV contention time: 0µs
              KV rows read: 2
              KV bytes read: 72 B
              KV gRPC calls: 4
              estimated max memory allocated: 80 KiB
              MVCC step count (ext/int): 2/2
              MVCC seek count (ext/int): 14/14
              estimated row count: 1,000 (missing stats)
              table: t1@t1_pkey
              spans: FULL SCAN (SOFT LIMIT)

A rewritten query like the following might prevent reads into remote regions if execution could stop when the first SetOp branch hits the limit:

select * from
(
(select * from t1, t2 where t2.crdb_region = 'us-east1' and t1.crdb_region = 'us-east1' and t1.a=t2.a limit 1) union all
(select * from t1, t2 where t2.crdb_region = 'us-east1' and t1.crdb_region <> 'us-east1' and t1.a=t2.a limit 1) union all
select * from t1, t2 where t2.crdb_region <> 'us-east1' and t1.a=t2.a limit 1)
limit 1;

msirek commented 1 year ago

Phase 3 options

Mark dynamic "query has no home region" errors as retryable, and do one of the following:

On retry, set the txn's historical timestamp to follower_read_timestamp(), and modify the local region stored in evalCtx.Locality to one of the remote regions. If the "query has no home region" error does not occur, then return a "query is not running in its home region" error, and report the home region. Cycle through all regions as the local region until the query executes successfully. If it never does, report the original error to the client.
On retry, copy the query text from the statement and execute it with an internal executor using follower_read_timestamp() and the enforce_home_region flag on, following the same erroring rules as above.

The first option would be ideal, but a transaction is not allowed to modify its historical timestamp if any rows have been read. The second option would be a little messy because there's no guarantee the query run in the internal executor would be optimized the same, and for prepared statements we'd need a means of finding the proper prepared SQL to run and placeholders to copy over. There are more chances for problems with this approach and may require a new API for calling the internal executor directly on an already-prepared statement.

Perhaps the first approach could be done with the addition of automatic savepoints which could be used to roll back any trace of rows read in the transaction, so its historical timestamp could be modified.

@andy-kimball You mentioned last week perhaps it's not so important to report the correct home region back to the user. I'll continue to look for solutions to the above, but please let me know if you think it's not worth the effort and if we should just drop phase 3 (or if you have other thoughts). Thanks.