FeatureBaseDB / tools

Tools for development and ops
BSD 3-Clause "New" or "Revised" License
20 stars 14 forks source link

Benchmarking Overview #10

Closed jaffee closed 7 years ago

jaffee commented 7 years ago

@jaffee commented on Thu Dec 08 2016

We've got a pretty good amount of benchmarking code, but there are still quite a number open questions about how it's all going to get tied together. I'd like to keep an overview in this ticket, break out the individual bits in other tickets and reference them here. Please comment and we can make edits as we find consensus.

The (likely to be renamed) bspawn command is the main entry point for running benchmarks. It handles cluster creation+teardown, agent creation+teardown, running benchmarks, aggregating results from all the agents (and the cluster if necessary), and storing or publishing them.

Rough Order of Operations for bspawn:

  1. Generate a run_uuid. The run_uuid will be associated with the output of all the different pieces of this benchmark run. Each agent's output, the output of cluster creation, agent creation, and any stats from the cluster may be stored separately, but as long as they include the run_uuid, it will be possible to correlate all data from a single run. #204
  2. Create the cluster. Cluster creation is managed by the pilosactl create command, but it's output should include the run_uuid, configuration parameters given, and information about the actual cluster created i.e. hostnames, detailed hardware info. Information about the version of pilosa running and any build parameters should also be included. #171
  3. Create agents. This is similar to cluster creation, and should report similar information. #205
  4. Run the various benchmarks specified in the bspawn config file. There may be a way to specify whether groups of benchmarks should be run in series or in parallel. The format #168 for the output of benchmarks should be specified well enough that it can be consumed automatically by further tools (i.e. visualization, anomaly detection, alerting).
  5. Store all the output somewhere - this will probably have some configurability. #203
  6. Tear down cluster, cluster infrastructure, and agent infrastructure - with the option to leave any of it in place (as an optimization for further use, to verify data, etc.)

Some general notes:


@codysoyland commented on Tue Mar 14 2017

Hey @jaffee - should we move this overview doc to the tools repo? Thanks!

jaffee commented 7 years ago

good call @codysoyland - done!