Provide a tool for scaleable correctness testing

alexmiller-apple commented 5 years ago

A small number of tests can be run as a basic sanity check by using ctest, but serious development ends up requiring a cluster of machines to be able to run tens of thousands of tests to verify the correctness of large, complex changes. It would be nice to provide tooling that allows others to run simulation tests in a distributed fashion.

thoughtpolice commented 5 years ago

Note for bystanders: as I posted on the forums recently, I have a tool for doing simulation runs on Kubernetes (which might scale to some extent): https://github.com/thoughtpolice/foundationdb-k8s

The idea is that every simulation test we'd run with ctest is thought of as a kubernetes job with a specific set of parameters, from which a set of k8s YAML manifests are generated, which you can run on your cluster. Currently these simulation runs use a hardwired Docker image built from Nix packages written upstream, but in theory you could use any docker image with a packaged fdbserver binary.

Side note: Personally, I think Nix is good, obviously, and an easy way to package docker images for this. Also, the FoundationDB source code does not any extensive patches to build with Nix, just a custom set of "build rules", meaning it should be possible to modify my tools to accept any possible source code working directory as the FDB source, and build packages from that. Then most of the nix use is abstracted away and you could integrate it into a build pipeline, probably, and it could be a more general tool for building FDB Docker Images.

Here's my main issue right now: In general, the current issue is actually having the hardware to run all the tests to any level of confidence. I don't have a cluster big enough to do this in a reasonable amount of time, I'm afraid. Even my ThreadRipper (32 cores, 64GB of RAM) is fairly constrained in its ability to chew thru large tests in a reasonable time, so any individual with the necessary equipment is looking at a very large investment to do that. There's always a cost to deploying distributed systems, but I also somewhat doubt I'd be getting anywhere soon even with a custom set of bash scripts when there are 80+ tests to run 100000+ times each or whatever. It also requires intimate knowledge of the tests and some good old fashioned elbow grease and guess work to get a good ratio of parallel jobs/resource use/total simulation runs/launched pods. I guess I can ask for a big GCP node limit bump and just try and grind away 50k total tests across everything as a first approach to make sure it even works...

But how do I "optimize" a scalable testing harness like this if it costs me like $20 and 5 hours to run it every time? It's extremely prohibitive to the "run-edit-compile" cycle when this is the case. So the thing is, even if you have the k8s manifests, a scalable correctness tool is only as useful as the hardware to run it. In this case it's going to be a sizeable investment if you want tests to run in any reasonable time. What even is a reasonable time to expect things to complete in? 8 hours? 24? A day and a half? If I double the node count should it cut that in half? I guess this is all sort of the point of this ticket, but this is hard to answer without a lot of gear...

If the goal is to make it easier for third party contributors to use any of this infrastructure, for example, to increase confidence in people submitting PRs -- it needs to be available to some extent. So the problem isn't just a scalable tool; presumably Apple can already do that internally, and if we don't care about third party people doing it, it doesn't really matter if it's available publicly, in a sense. But there's a cost to everything, I guess (it's probably much easier for you to throw it over the fence to a pre-created build farm someone else made earlier.)

At the moment, if Kubernetes was to end up as the primary way to do this, I would probably need Apple developers to work with my K8S manifests to test it/iron out bugs, since they can test it at a reasonable scale.

Some other, related thoughts: FoundationDB is open source, and if the goal is for it to be an open community project -- it would be nice if any infrastructure for doing this testing was openly available/managed, and any and all use of it between Apple developers and the community was equalized. This doesn't just mean e.g. people could look at or hack any deployment scripts, and the nodes have openly runnable software. It also means the core Apple developers would rely on it for their core workflows, too, like any third party when they develop. You submit PRs and ask a bot to do large scale batch testing on your behalf, like anyone else. After all, I can't test my own PRs before submitting them by using a shadowy build farm locked in a vault 20 feet underground in Cupertino somewhere, and I can't even run my own harness to the same extent y'all can. But if you put everyone on an equal footing -- again, assuming you can get a Big Cluster -- then I think that's the ideal case. In that instance, where these systems are the first class developer tools, I'd be more than happy to try and provide the primary infrastructure for the K8S manifests, for example, so PRs could be tested reliably as large scale batch jobs.

It's possible you (or anyone?) might be able to acquire some hardware for doing these tests or broker a deal with some provider for promotions, etc -- Packet.net does promotions like this. On the other hand, Apple isn't exactly strapped for cash and is the primary developer here, so perhaps asking for freebies isn't too nice. And I'm not a core developer, so asking Packet for like 8 Xeons seems a little excessive when all I have to fill out the "Why do you want this?" form is "to burn a bunch of CPU cycles watching my database try to confuse itself". Maybe Apple could just buy the FDB team a GKE Cluster from Google(!!!) separated from any of your other infrastructure, and FoundationDB "the project" could use that for everything.

dumblob commented 4 years ago

jzhou77 commented 3 years ago

The swarm testing framework is now open-sourced: https://github.com/foundationdb/fdb-joshua

apple / foundationdb

Provide a tool for scaleable correctness testing #1564