Roachtests can optionally specify regions (and availability zones) via ClusterSpec, e.g., ClusterSpec.GCE.Zones. If the regions are specified, they will be used by roachprod, when provisioning a corresponding cluster. The specified regions are assumed to be absolute. That is, only availability zones are allowed to change during provisioning, while the regions never change. E.g., a transient cluster provisioning error may be retried in a different availability zone [1]. The primary reason for disallowing specified regions to change is egress. A number of roachtests may end up importing/exporting data from regional cloud buckets [2].
When regions are unspecified (via ClusterSpec), a cloud-specific default is chosen. The defaults correspond to the regional buckets (in each cloud), used for import/export. However, a default may be at odds with the availability of other resources [3], [4]. (E.g., GCE t2a instances are not available in us-east1.) Subsequently, switching to another region may result in unwanted egress, for the corresponding roachtest. Thus, we should consider how best to enforce "region affinity" in this case. Perhaps, this could be a heuristic based on the size and the type of a roachtest; e.g., large backup/restore tests should stay within the region defaults, unless otherwise specified (via ClusterSpec). Since input/output buckets aren't part of the spec., inferring those will be challenging. Making regions a required spec., for every roachtest seems rather inflexible.
Roachtests can optionally specify regions (and availability zones) via
ClusterSpec
, e.g.,ClusterSpec.GCE.Zones
. If the regions are specified, they will be used by roachprod, when provisioning a corresponding cluster. The specified regions are assumed to be absolute. That is, only availability zones are allowed to change during provisioning, while the regions never change. E.g., a transient cluster provisioning error may be retried in a different availability zone [1]. The primary reason for disallowing specified regions to change is egress. A number of roachtests may end up importing/exporting data from regional cloud buckets [2].When regions are unspecified (via
ClusterSpec
), a cloud-specific default is chosen. The defaults correspond to the regional buckets (in each cloud), used for import/export. However, a default may be at odds with the availability of other resources [3], [4]. (E.g., GCEt2a
instances are not available inus-east1
.) Subsequently, switching to another region may result in unwanted egress, for the corresponding roachtest. Thus, we should consider how best to enforce "region affinity" in this case. Perhaps, this could be a heuristic based on the size and the type of a roachtest; e.g., large backup/restore tests should stay within the region defaults, unless otherwise specified (viaClusterSpec
). Since input/output buckets aren't part of the spec., inferring those will be challenging. Making regions a required spec., for every roachtest seems rather inflexible.[1] https://github.com/cockroachdb/cockroach/pull/120714 [2] https://github.com/cockroachdb/cockroach/issues/111371 [3] https://github.com/cockroachdb/cockroach/pull/117661 [4] https://github.com/cockroachdb/cockroach/issues/114523
Jira issue: CRDB-37257