cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.21k stars 3.82k forks source link

kvserver: lease preferences can be confusing #106107

Open erikgrinaker opened 1 year ago

erikgrinaker commented 1 year ago

I'm not sure if this is a docs issue or a UX issue, but opening here to get some thoughts. Opened a corresponding docs issue in DOC-8259.

I find lease preferences to be a bit confusing. I got two things immediately wrong:

  1. Lease preferences are not ordered by priority. If I specify [[+rack=0], [+rack=1]], I would expect leases to be placed in rack=0 when possible, and only if rack=0 can't be satisfied (e.g. because all nodes are unavailable) should they be placed in rack=1. However, leases are placed in either of rack=0 or rack=1.

  2. It is sufficient for a store to satisfy any preference. If I specify [[+rack=0], [-rack=2]] I would expect leases to only be placed in rack=0, but I found a bunch of leases also with rack=1. This happens because any constraint is sufficient, and rack=1 satisfies the -rack=2 constraint. I should have specified [[+rack=0, -rack=2]] instead.

In both of these cases, I misinterpreted the meaning of the lease preference list. It's basically an OR of ANDs: if a store satisfies ALL constraints in ANY preference, it may get a lease.

I guess we should document the current policy regardless, but is this the most useful or intuitive policy structure?

Jira issue: CRDB-29406

kvoli commented 1 year ago
  1. Lease preferences are not ordered by priority.

They should ordered by priority? Could you share some more info on where you saw this not happening?

If I specify [[+rack=0], [+rack=1]], I would expect leases to be placed in rack=0 when possible, and only if rack=0 can't be satisfied (e.g. because all nodes are unavailable) should they be placed in rack=1. However, leases are placed in either of rack=0 or rack=1.

What you might be running into is the case where there are no existing voters on stores with attr rack=0 but there is at least one on rack=1. The allocator won't suggest a rebalance in order to satisfy a lease preference. This could be better documented.

It is sufficient for a store to satisfy any preference.

They are ordered, the first preference that can be satisfied, is used. Probably similar situation to above?

erikgrinaker commented 1 year ago

Consider a 5-node cluster with RF=5 across three racks:

$ roachprod create local -n 5
$ roachprod start local --racks 3

> create database kv;
> alter database kv configure zone using num_replicas=5, constraints='{"+rack=0": 2, "+rack=1": 2, "+rack=2": 1}';

$ ./cockroach workload init kv --splits 1000
  1. Lease preferences are not ordered by priority.

They should ordered by priority? Could you share some more info on where you saw this not happening?

I guess it depends on what you mean by priority.

If I specify [[+rack=0], [+rack=1]], leases will be evenly distributed across rack=0 and rack=1 (4 nodes). I would naïvely expect them all to be on rack=0 (n1,n4), because the first rule should take priority. This happens because both rack=0 and rack=1 satisfy a preference.

What priority really means here is that the first preference that applies to a node is used. This is not the same thing. If I set [[+rack=0], [-rack=0]] then they will be placed on rack=0 because +rack=0 takes priority over -rack=0 (or any other tag that might be used).

If I specify [[+rack=0], [+rack=1]], I would expect leases to be placed in rack=0 when possible, and only if rack=0 can't be satisfied (e.g. because all nodes are unavailable) should they be placed in rack=1. However, leases are placed in either of rack=0 or rack=1.

What you might be running into is the case where there are no existing voters on stores with attr rack=0 but there is at least one on rack=1. The allocator won't suggest a rebalance in order to satisfy a lease preference. This could be better documented.

No, we had voters across all nodes. See above. We won't move a lease from rack=1 to rack=0, even though rack=0 "takes priority", because rack=1 satisfies a preference.

It is sufficient for a store to satisfy any preference.

They are ordered, the first preference that can be satisfied, is used. Probably similar situation to above?

Yes, but scoped to each individual node, not globally.

kvoli commented 1 year ago

If I specify [[+rack=0], [+rack=1]], leases will be evenly distributed across rack=0 and rack=1 (4 nodes). I would naïvely expect them all to be on rack=0 (n1,n4), because the first rule should take priority.

This is what I expect too. This is what the code explicitly does when selecting a “preferred” leaseholder:

https://github.com/cockroachdb/cockroach/blob/5a53114bf5ae9e5ec672d43acf599f145bf22341/pkg/kv/kvserver/allocator/allocatorimpl/allocator.go#L2811-L2818

I wouldn't expect them to be evenly distributed across rack=0 and rack=1. Unless I'm missing something, that is a bug.

I ran a slightly modified script on https://github.com/cockroachdb/cockroach/commits/5a53114bf5ae9e5ec672d43acf599f145bf22341 (release-23.1) and didn't see this behavior.

script ```bash #!/bin/bash export cluster=local roachprod create $cluster -n 5 roachprod start $cluster --racks 3 roachprod sql $cluster:1 -- -e "create database kv;" roachprod sql $cluster:1 -- -e "alter database kv configure zone using num_replicas=5, constraints='{"+rack=0": 2, "+rack=1": 2, "+rack=2": 1}', lease_preferences='[[+rack=0], [+rack=1]]';" ./cockroach workload init kv --splits 1000 ``` ```sql root@localhost:26257/defaultdb> show zone configuration from database kv; target | raw_config_sql --------------+------------------------------------------------------------ DATABASE kv | ALTER DATABASE kv CONFIGURE ZONE USING | range_min_bytes = 134217728, | range_max_bytes = 536870912, | gc.ttlseconds = 14400, | num_replicas = 5, | constraints = '{+rack=0: 2, +rack=1: 2, +rack=2: 1}', | lease_preferences = '[[+rack=0], [+rack=1]]' (1 row) Time: 4ms total (execution 3ms / network 0ms) ``` ![image](https://github.com/cockroachdb/cockroach/assets/39606633/a355d285-8c5b-40d9-ad63-e11284218e19)

All the leases end up on rack=0. I'll try out some other versions.

image

erikgrinaker commented 1 year ago

Maybe it doesn't work when leases are initially balanced across racks or something. Or maybe I messed something up. Will retry when I'm back.