cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.88k stars 3.77k forks source link

roachtest: enforce region affinity #121437

Open srosenberg opened 5 months ago

srosenberg commented 5 months ago

Roachtests can optionally specify regions (and availability zones) via ClusterSpec, e.g., ClusterSpec.GCE.Zones. If the regions are specified, they will be used by roachprod, when provisioning a corresponding cluster. The specified regions are assumed to be absolute. That is, only availability zones are allowed to change during provisioning, while the regions never change. E.g., a transient cluster provisioning error may be retried in a different availability zone [1]. The primary reason for disallowing specified regions to change is egress. A number of roachtests may end up importing/exporting data from regional cloud buckets [2].

When regions are unspecified (via ClusterSpec), a cloud-specific default is chosen. The defaults correspond to the regional buckets (in each cloud), used for import/export. However, a default may be at odds with the availability of other resources [3], [4]. (E.g., GCE t2a instances are not available in us-east1.) Subsequently, switching to another region may result in unwanted egress, for the corresponding roachtest. Thus, we should consider how best to enforce "region affinity" in this case. Perhaps, this could be a heuristic based on the size and the type of a roachtest; e.g., large backup/restore tests should stay within the region defaults, unless otherwise specified (via ClusterSpec). Since input/output buckets aren't part of the spec., inferring those will be challenging. Making regions a required spec., for every roachtest seems rather inflexible.

[1] https://github.com/cockroachdb/cockroach/pull/120714 [2] https://github.com/cockroachdb/cockroach/issues/111371 [3] https://github.com/cockroachdb/cockroach/pull/117661 [4] https://github.com/cockroachdb/cockroach/issues/114523

Jira issue: CRDB-37257

blathers-crl[bot] commented 5 months ago

cc @cockroachdb/test-eng