Open FUDCo opened 1 year ago
Here is a list of plausible benchmarking tool features which have not yet been implemented.
One major reason these haven't been implemented is that we lack enough operational experience with this very immature tooling to know if these features are things we actually need or want yet. But somebody thought of them and so I'm collecting them here so the ideas don't get lost.
What is the Problem Being Solved?
Right now we don't really have a good way to get performance measurements regarding the operation of contracts or any other code that is running in a vat or vats on chain, short of examining the operation of the production chain itself. Clearly it's not practical or wise to just try things out in production, which makes performance engineering our overall system challenging. We want to fix this by providing tooling for developers to write performance benchmarks, execute them in a mostly realistic environment, and then measure their performance, all in support of a normal development code-test-debug-try-again lifecycle except focused on performance engineering.
Description of the design
To this end, we've identified a couple of different strategies:
Swingset-runner is capable of running arbitrary swingsets and can be adapted to running relatistic benchmarks by adding code to emulate the function of the bridge device and other chain-specific machinery such as that which cosmic-swingset provides. Moreover, swingset-runner already has support for benchmark orchestration and data collection. Since its means of dynamically loading code is by launching vats, a benchmark test itself has to be done from within the swingset via a driver vat that implements the benchmark logic itself. Note that this is quite different from how Ava tests are written, even though we have a strong suspicion that there are quite a few tests that are likely to be the seeds of benchmarks of the functionality that those tests exercise. However, these tests would need substantial adaptation to be run from inside a vat. On the other hand, the resulting performance simulation should be quite accurate. We call this approach the "inside view".
Tests written using Ava are capable of driving a swingset from the outside, but Ava itself is not really architected to be a benchmark driver (though we have made a preliminary step in that direction: see #7960). However, a simple benchmark driver framework inspired by Ava but specifically intended for the implementation of benchmarks instead of correctness tests should, in principle, be relatively straightforward to construct. This framework would take care of setting up all the basic chain infrastructure (e.g., by executing the chain bootstrap that gets the vats and devices that constitute the basic Agoric ecosystem up and running), leaving the benchmark authors to only have to implement the parts of the benchmark that involve the specific functionality being measured. This framework would also take care of measuring timing values & other resource usage, then collecting and recording this data, in much the same manner as swingset-runner already does. The principal benefits of this strategy are speed and simplicity from the perspective of the benchmark authors. We call this approach the "outside view".
In principle these two approaches are complementary, though it seems likely that one or the other will become the dominant form (I'd bet on that being the outside view approach due to developer convenience, though I personally like the inside view approach more).
Other considerations
This issue is an epic to track our work on these frameworks. Note that as of this writing, substantial development work on both fronts has already happened (in particular, the first pass at the inside view has already landed in the form of PR #8239). This issue is backfilling the informal plan that we have already been following, so that it can be properly tracked and monitored in our project management system.