cloud-bulldozer / benchmark-wrapper

Python Library to run benchmarks
https://benchmark-wrapper.readthedocs.io
Apache License 2.0
19 stars 56 forks source link

Can the grafana dashboard metric percentiles of bandwidth/latency in the fio ES index can use compare different run? #360

Open inevity opened 2 years ago

inevity commented 2 years ago

We know percentiles often help found the outlier such as node failure or bug. But for test normal system, can we judge that a run is better than another run by only look at the metric percentiles ? or What is the your suggest to found which run or which benchmark is better using grafana, not using benckmark-compare repo?

And what was the initial purpose of the fio percentiles in the below link?

Question is from the fio dashboard.json https://github.com/cloud-bulldozer/arsenal/blob/master/fio-distributed/grafana/6.3.0/dashboard.json.

bengland2 commented 2 years ago

@inevity response-time percentiles IMHO are a better indication of customer experience than throughput -- you can have the best throughput in the world but if the app frequently has a 5-second response time from storage, it may be a failure. We want to look at both throughput and response time, and mean response time is sort of irrelevant - what response time can the application count on? However, response-time percentiles are a much noisier metric , so you would have to be less sensitive to changes in this metric in order to use it. Make sense?

Also, I have to look at the arsenal/ link and see if that is up to date, I wasn't using it (maybe I should, I think it involves "jsonnet").

jtaleric commented 2 years ago

For comparisons, I suggest something like a text based tool like benchmark-comparison (aka Touchstone).

Regarding dashboards, please use https://github.com/cloud-bulldozer/performance-dashboards

inevity commented 2 years ago

The arsenal/ link is not up to date, but the fio percentiles is same as ycsb template from cloud-bulldozer/performance-dashboards which use the jsonnet. Nowdays, the performance-dashboards did not have the fio template.

This is why i ask you for the redhat link to learn what method they use.

If i understand correctly, the performance dashboard is used on the daily monitoring scene, not proper to do benckmark compare.

Do find we can compare percentiles with some confide interval on the internet, but it need another work to do. so just omit it.

bengland2 commented 2 years ago

@inevity, I posted the JSON for the OCS fio grafana dashboard here, hope this helps. It depends on data sources and variables so you have to fill those in as well. Let me know if you need details or have questions.

inevity> If i understand correctly, the performance dashboard is used on the daily monitoring scene, not proper to do benckmark compare.

That's correct, the performance-dashboard is for looking at the results of a single test, not necessarily comparing two tests.

inevity> Do find we can compare percentiles with some confide interval on the internet, but it need another work to do. so just omit it.

So if you do this, you risk false positives (i.e. pulling the fire alarm for a regression when there was no regression) or false negatives (i.e. concluding that there is no significant difference when in fact there is). But if you monitor the std deviation in your samples, you can get a rough idea of whether this is a problem or not by comparing sum of these to the difference in means. If the sum is << difference in means then you're probably ok to conclude that they are really different. It's just not that precise a method.

inevity commented 2 years ago

Thank your explaination.