Closed bdarnell closed 6 years ago
Since cluster creation and destruction is now part of the infrastructure (and outside of the test's control), it seems feasible to reuse clusters between tests when they have the same specification.
Looking into this more, it isn't just the cluster that we want to share across all of the jepsen tests, but also a bunch of shared setup for that cluster (i.e. installing software). I need to think about this a bit more, but reusing a cluster between tests when they have the same cluster specification seems insufficient. Or perhaps the cluster specification also needs to provide a hook for onetime initialization.
Yeah, I was thinking the cluster spec would gain some argument that would select an initialization function. Or we could bake all of this initialization into a disk image with packer. If we had pre-built disk images, would we still want to reuse clusters that share a disk image, or would that get the cost low enough that we'd just create a new cluster each time? (Our current setup runs 40 jepsen configurations, each of which runs for ~7 minutes).
I've been thinking about how to reuse clusters between tests and not particularly liking an approach where I have to match up node specs and initialization functions to clusters and determining when a cluster can be destroyed vs reused.
As an alternative, we could introduce a concept of sub-tests similar to Go's testing
sub-tests. All of the sub-tests would reuse the cluster for the parent test and would be run sequentially. There would be a lot less magic involved with this approach.
Our jepsen test suite currently runs 70 permutations of test+nemesis. In porting this to roachtest, these would ideally each be their own
testSpec
, but since eachtestSpec
gets a new cluster this would cause the cluster setup time to dominate (including both VM creation and installing java and other dependencies that aren't used by our other tests). We should be able to have a cluster that is shared across all jepsen tests to amortize this cost.The workaround is to make the entire jepsen test suite a single "test", but this has limitations for error reporting.