cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.92k stars 3.78k forks source link

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

Closed sheaffej closed 2 years ago

sheaffej commented 2 years ago

Describe the problem

When performing a bounded staleness read on a table where a majority of its voting replicas are down or network isolated away, with v22.1.5 an error is returned, compared to v21.2.13 where the query succeeds.

In v21.2.13, I can perform a bounded staleness read on rides (in a surviving region):

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
                   id                  |     start_time      |      end_time
---------------------------------------+---------------------+----------------------
  00000000-0000-4000-8000-000000000000 | 2018-12-21 03:04:05 | 2018-12-23 08:04:05
(1 row)

But in v22.1.5, I get this error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

To Reproduce

Create a multi-region cluster, set the database to ZONE survival goal, and the rides table as a REGIONAL table. Wait until any rebalancing has completed, then fail all nodes in the databases's primary region. Connect to surviving node, and attempt a bounded staleness read.

Create the cluster:

# Change based on your username
CLUSTER=j4-movr-mr-demo

LIFETIME="12h0m0s"
MACHINE="n2-standard-4"
# RELEASE="v21.2.13"
RELEASE="v22.1.5"
ZONES="us-west3-b,us-west1-b,us-east4-b,us-east1-b"
NODES=12

roachprod create ${CLUSTER} \
   -n ${NODES} \
   --gce-machine-type=${MACHINE} \
   --gce-zones=${ZONES} \
   --geo \
   --lifetime=${LIFETIME}

roachprod stage ${CLUSTER} release ${RELEASE}
roachprod start ${CLUSTER}:1-12 --args "--max-offset 250ms"
roachprod adminurl ${CLUSTER}:1 --open
roachprod status ${CLUSTER}

# Install an enterprise license and set any preferred cluster settings
echo "
SET CLUSTER SETTING cluster.organization = 'J4 CRL Demo';
SET CLUSTER SETTING enterprise.license = '---->get-your-own-key<----';
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '4g'; 
SET CLUSTER SETTING kv.snapshot_recovery.max_rate = '4g';
SET CLUSTER SETTING server.time_until_store_dead = '1m15s';
" | roachprod sql $CLUSTER:1

Create the database and tables

DROP DATABASE IF EXISTS movr_demo;
CREATE DATABASE movr_demo;

ALTER DATABASE movr_demo SET PRIMARY REGION = "us-west1"; 
ALTER DATABASE movr_demo ADD REGION "us-west3"; 
ALTER DATABASE movr_demo ADD REGION "us-east4";
ALTER DATABASE movr_demo ADD REGION "us-east1";

-- Just being explicit here
ALTER DATABASE movr_demo SURVIVE ZONE FAILURE;

CREATE TABLE movr_demo.public.rides
(
    id uuid PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
    city STRING,
    vehicle_city STRING,
    rider_id uuid,
    vehicle_id uuid,
    start_address STRING,
    end_address STRING,
    start_time timestamp,
    end_time timestamp,
    revenue DECIMAL
)
;

IMPORT INTO movr_demo.public.rides
CSV DATA (
    'workload:///csv/movr/rides?infer-crdb-region-column=true&multi-region=false&num-histories=1000&num-promo-codes=1000&num-ranges=9&num-rides=100000&num-users=1000&num-vehicles=5000&row-end=100000&row-start=0&seed=1&survive=az&version=1.0.0'
) WITH "nullif" = 'NULL'
;

-- Just being explicit here
ALTER TABLE movr_demo.public.rides SET LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;

After the cluster has settled with replica movement, fail the 3 primary region nodes

roachprod stop $CLUSTER:4-6

Observe that the ranges become unavailable.

Connect to a surviving node (e.g. us-east1):

roachprod sql $CLUSTER:10

Then run the bounded staleness query. This rides ID is always present in this generated data set, so if you ran the steps exactly above, this query should succeed.

select id, start_time, end_time 
from rides as of system time with_max_staleness('1h') 
where id = '00000000-0000-4000-8000-000000000000';

In v21.2.13, the query succeeds. In v22.1.5, the query returns the error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

Jira issue: CRDB-18155

rytaft commented 2 years ago

Thanks for the info, @sheaffej! This should be pretty easy to fix -- PR coming shortly.