roachtest: Reuse clusters across test families

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

30.07k stars 3.8k forks source link

roachtest: Reuse clusters across test families #23915

Closed bdarnell closed 6 years ago

bdarnell commented 6 years ago

Our jepsen test suite currently runs 70 permutations of test+nemesis. In porting this to roachtest, these would ideally each be their own testSpec, but since each testSpec gets a new cluster this would cause the cluster setup time to dominate (including both VM creation and installing java and other dependencies that aren't used by our other tests). We should be able to have a cluster that is shared across all jepsen tests to amortize this cost.

The workaround is to make the entire jepsen test suite a single "test", but this has limitations for error reporting.

petermattis commented 6 years ago

Since cluster creation and destruction is now part of the infrastructure (and outside of the test's control), it seems feasible to reuse clusters between tests when they have the same specification.

petermattis commented 6 years ago

Looking into this more, it isn't just the cluster that we want to share across all of the jepsen tests, but also a bunch of shared setup for that cluster (i.e. installing software). I need to think about this a bit more, but reusing a cluster between tests when they have the same cluster specification seems insufficient. Or perhaps the cluster specification also needs to provide a hook for onetime initialization.

bdarnell commented 6 years ago

Yeah, I was thinking the cluster spec would gain some argument that would select an initialization function. Or we could bake all of this initialization into a disk image with packer. If we had pre-built disk images, would we still want to reuse clusters that share a disk image, or would that get the cost low enough that we'd just create a new cluster each time? (Our current setup runs 40 jepsen configurations, each of which runs for ~7 minutes).

petermattis commented 6 years ago

I've been thinking about how to reuse clusters between tests and not particularly liking an approach where I have to match up node specs and initialization functions to clusters and determining when a cluster can be destroyed vs reused.

As an alternative, we could introduce a concept of sub-tests similar to Go's testing sub-tests. All of the sub-tests would reuse the cluster for the parent test and would be run sequentially. There would be a lot less magic involved with this approach.