We've got a pretty good amount of benchmarking code, but there are still quite a
number open questions about how it's all going to get tied together. I'd like to
keep an overview in this ticket, break out the individual bits in other tickets
and reference them here. Please comment and we can make edits as we find consensus.
The (likely to be renamed) bspawn command is the main entry point for running
benchmarks. It handles cluster creation+teardown, agent creation+teardown,
running benchmarks, aggregating results from all the agents (and the cluster if
necessary), and storing or publishing them.
Rough Order of Operations for bspawn:
Generate a run_uuid. The run_uuid will be associated with the output of all
the different pieces of this benchmark run. Each agent's output, the output
of cluster creation, agent creation, and any stats from the cluster may be
stored separately, but as long as they include the run_uuid, it will be
possible to correlate all data from a single run. #204
Create the cluster. Cluster creation is managed by the pilosactl create
command, but it's output should include the run_uuid, configuration
parameters given, and information about the actual cluster created
i.e. hostnames, detailed hardware info. Information about the version
of pilosa running and any build parameters should also be included. #171
Create agents. This is similar to cluster creation, and should report similar
information. #205
Run the various benchmarks specified in the bspawn config file. There may
be a way to specify whether groups of benchmarks should be run in series or
in parallel. The format #168 for the output of benchmarks should be specified well
enough that it can be consumed automatically by further tools (i.e.
visualization, anomaly detection, alerting).
Store all the output somewhere - this will probably have some configurability. #203
Tear down cluster, cluster infrastructure, and agent infrastructure - with
the option to leave any of it in place (as an optimization for further use,
to verify data, etc.)
Some general notes:
Start cluster and agents and get results over ssh (unless on localhost) - whatever cluster creation methodology is used will have to provide keys and ports. #202
@jaffee commented on Thu Dec 08 2016
We've got a pretty good amount of benchmarking code, but there are still quite a number open questions about how it's all going to get tied together. I'd like to keep an overview in this ticket, break out the individual bits in other tickets and reference them here. Please comment and we can make edits as we find consensus.
The (likely to be renamed)
bspawn
command is the main entry point for running benchmarks. It handles cluster creation+teardown, agent creation+teardown, running benchmarks, aggregating results from all the agents (and the cluster if necessary), and storing or publishing them.Rough Order of Operations for
bspawn
:pilosactl create
command, but it's output should include the run_uuid, configuration parameters given, and information about the actual cluster created i.e. hostnames, detailed hardware info. Information about the version of pilosa running and any build parameters should also be included. #171bspawn
config file. There may be a way to specify whether groups of benchmarks should be run in series or in parallel. The format #168 for the output of benchmarks should be specified well enough that it can be consumed automatically by further tools (i.e. visualization, anomaly detection, alerting).Some general notes:
@codysoyland commented on Tue Mar 14 2017
Hey @jaffee - should we move this overview doc to the tools repo? Thanks!