cloud-bulldozer / k8s-netperf

Running Networking Performance Tests against K8s
Apache License 2.0
29 stars 21 forks source link

[RFE] Confidence score in the end result #42

Closed jtaleric closed 1 year ago

jtaleric commented 1 year ago

Confidence score when more than 1 workload driver and more than 1 sample.

Determine confidence by :

Confidence could be high/low -- if the %diff of the above is within a specific threshold.

smalleni commented 1 year ago

I would like for us to take a more statistically sound approach by moving from point estimates to interval estimates using widely known, understood and adopted confidence intervals that are reported to the user instead of a made up boolean like high/low.

The right way to do this would be through the use of inferential statistics. - estimating population from samples. When we run a network performance test on a given setup, we generate samples of results that we believe would represent the "population" (all) OpenShift clusters that have the same config.

A confidence interval can be calculated adding and subtracting a margin of error from the sample mean.

Margin of Error can be calculated by multiplying the Z/T critical value with the standard error (standard deviation/√n). For a 95% confidence interval that is widely used in statistics, the significance level (alpha) would be 1-0.95=0.05. The T critical value can be calculated using the alpha value and multiplied with the standard error to get the margin of error that can be added and subtracted from the mean to give an interval in which the population mean can lie.

In other words, using a 95% confidence interval when running 5 samples of the network test: Sample Average can be: 4925.900000 (Mb/s) Confidence Interval (calculated by adding and subtracting the margin of error): 4917.188517-4934.611483 (Mb/s) which will give a band in which one can expect the throughput to lie 95% of the time even if someone were to repeat the test.

A quick reference on confidence intervals for estimating population mean: http://www.stat.yale.edu/Courses/1997-98/101/confint.htm