elastic / hey-apm

Basic load generation for apm-server built on hey
Apache License 2.0
16 stars 16 forks source link

Feature: identify statistically significant changes #186

Open axw opened 4 years ago

axw commented 4 years ago

Similar to benchstat, it would be useful to be able to compare all of the benchmarks run for two different apm-server builds, and identify statistically significant changes.

The way benchstat works is by:

  1. Removing outliers using the interquartile range method
  2. Using a Mann-Whitney U test or two-sample Welch t-test to calculate a p-value indicating statistically significant difference.

We could do this by creating two transforms, each grouping on apm-server build and benchmark name, which will:

  1. compute the interquartile range (i.e. percentiles or boxplot agg) for each metric (e.g. events_indexed, allocations)
  2. produce non-outlier values for each metric using scripted_metric aggs and the output of the IQR transform

Then given two apm-server builds, we can use the t_test aggregation for each benchmark/metric combination.