Elasticsearch result analysis programs

bengland2 commented 2 years ago

Do not merge yet. While this PR tweaks documents generated by fs-drift to include host/pod name, its real significance is that it adds programs that analyze Elasticsearch results for storage benchmarks without relying on Grafana, using common Elasticsearch code (much like run_snafu does). fs-drift is used as an example of how this can work.

One issue: I had to set PYTHONPATH to get snafu/fs_drift_wrapper/analyze_test_results.py to work :-( I don't yet know why.

rsevilla87 commented 2 years ago

Hey @bengland2, I think having these result analysis scripts are out of the scope of this repository. I'm aware that there're other scripts kind of similar to this (fio_analyzer.py), however fio_analyzer is linked to the actual workload.

Given these facts, I can propose two alternatives to address this:

Move analysis scripts to a different repository
Link the analysis feature to the actual workloads, so a user can be able to invoke them using the same entrypoint (run_snafu).

Does it makes sense to you?

bengland2 commented 2 years ago

Am scheduling a meeting with Raul Sevilla for next week to discuss how to resolve the above, maybe change this PR to fit better into the benchmark result analysis strategy for cloud-bulldozer. I'm aware of touchstone, will take a closer look at what was done there. Am also aware of the fio Grafana dashboard but I do have some concerns and questions about it and how to generalize it to other benchmarks.

bengland2 commented 2 years ago

In the meantime, why is PYTHONPATH necessary to run these *_wrapper/analyze_test_results.py analysis programs (but not the query_result_uuids.py program)? For example:

[bengland@localhost benchmark-wrapper]$ PYTHONPATH=$PWD python3 snafu/fio_wrapper/analyze_test_results.py a450a061-8b99-5cda-958a-c9dcc62a8ec7 
Elasticsearch server at https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com, verify_certs True
  op-type,      block-size,  sample,      pod,           process,        iops, bw (KiB/s), elapsed-time (sec), %deviation
      read,   256,              1,   10.128.0.131,                1,      94.3, 24146.000,    600.0
...

but without PYTHONPATH it fails to find the module, even though it is right there in the subdirectory:

[bengland@localhost benchmark-wrapper]$ python3 snafu/fio_wrapper/analyze_test_results.py a450a061-8b99-5cda-958a-c9dcc62a8ec7 
Traceback (most recent call last):
  File "/home/bengland/openshift/ripsaw/fs-drift/benchmark-wrapper/snafu/fio_wrapper/analyze_test_results.py", line 15, in <module>
    from snafu.utils.fetch_es_test_results import connect_es, next_result
ModuleNotFoundError: No module named 'snafu'

However, query_result_uuids.py works fine without PYTHONPATH:

[bengland@localhost benchmark-wrapper]$ python3 snafu/utils/query_result_uuids.py ripsaw-fio-results timestamp_end 2021-09-01T00:00:00Z
Elasticsearch server at https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com, verify_certs True
...

bengland2 commented 2 years ago

However, in the python3.6 CI test, query_test_results.py fails with the same error:

ERROR snafu/utils/query_result_uuids.py - ModuleNotFoundError: No module name...
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

Why is it being run in the first place?

bengland2 commented 2 years ago

I reproduced CI errors on my laptop by just running "tox", nice! Will work on that.

bengland2 commented 2 years ago

I've evaluated snafu/fio_wrapper/fio_analyzer.py , which is what I think Raul was referring to above. This module was intended to give result consumers (like Grafana fio dashboard) a way to display aggregate fio results for the cluster, something my primitive CLI analysis tools will not do. The basic idea is sound - run_snafu can calculate an average and standard deviation across samples and post these to a separate index, since it has the results for all the fio pods and is run for each sample. I'm rethinking whether I could do something like this for smallfile and fs-drift. If so, then maybe we don't need separate analysis programs that output to stdout at all - Kibana could show the results to us, or eventually a dashboard.

However, there is one big difference between fio and the other two. fio-client's run_snafu.py has access to all the per-pod results (returned from the fio-server pods). smallfile pods run independently (with redis synchronization). So another step would have to be written after redis synchronization to do this calculation by querying ES (perhaps in smallfile-client-1) after all pods have completed each sample. fs-drift may have the same problem. But in principle we could do the same calculations that are done in *_wrapper/analyze_test_results.py above and just output them to an ES index instead of to stdout. This would enable a smallfile or fs-drift dashboard.

The query_result_uuids.py program could arguably be replaced by a Kibana discover display on a result index with a specified time window, this would perhaps eliminate the need for it.

How does snafu.utils.fetch_es_test_results.py compare to what is in touchstone, er benchmark-comparison? I don't see the es.scroll() function being called in benchmark-comparison, does that mean it assumes that all results will fit within a single es.search() call? Also, I don't see anything in benchmark-comparison to allow disabling cert verification, some of my users were complaining that they couldn't access ES from my CLI otherwise.

bengland2 commented 2 years ago

the reason PYTHONPATH=$PWD was required to run the analysis programs and uuid query programs was that these are not callable code, they are scripts, but they import functions from snafu.utils.fetch_es_test_results, but the scripts are located in a package (directory with a init.py). When I copy the programs up to benchmark-wrapper directory and run them from there, they work fine without PYTHONPATH env. var. The next commit migrates almost all of the code in query_result_uuids.py to fetch_es_test_results.py, so what's left of it is trivial and can be moved wherever. As for *_wrapper/analyze_test_results.py, once we decide what happens to that code we can figure out how to resolve this issue.

inevity commented 2 years ago

Since we want that Users can run Benchmark-wrapper in a traditional bare-metal environment or with the use of benchmark-operator ran in a containerized environment such as Kubernetes, we should put es result analysis and es result visualision to the other repo. Of couse we can put common analsys lib in the benchmark-wrapper. I just run the benchmark-wrapper fio in a bare-metal environment, look at the wrapper code ,just need little change to full support bare-metal environment such as need drop-cache on bare-metal host. Any code involved with the pod/host/etc should move to the benchmark-wrapper/future bare-metal benchmark-wrapper.

As for the grafana dashboard,i have create the current fio result analysis dashboard.json accroding the previous fio dashboard.json which is not applicable on the current es result. But the grafana metric is all based date histogram. Now wan to visualization the other dim such as iodepth. So can you share the link you share? @bengland2

bengland2 commented 2 years ago

@inevity not certain which link you are asking for, but grafana dashboard for OCS fio is available internal to Red Hat here. If you need access to the dashboard and you are outside Red Hat let me know.

inevity commented 2 years ago

@inevity not certain which link you are asking for, but grafana dashboard for OCS fio is available internal to Red Hat here. If you need access to the dashboard and you are outside Red Hat let me know.

Yes, outside. Thank you.

bengland2 commented 2 years ago

@inevity, about your comment from 24 days ago, drop-cache functionality is supported today

bengland2 commented 2 years ago

@inevity sorry didn't see your ask earlier, too much going on! As for fio dashboard, I can't export that as is outside redhat today because in its raw form it isn't really source code, it's a whole website, but the closest I can get to something exportable is the JSON description of it, which is here Maybe some portion of this would be useful? What we really need to do is make fio dashboard into jsonnet form, which allows it to be incorporated as source code and rebuilt into the dashboard on the user's site. Here is a repo that talks about how to do this and has some examples. Specifically here is an example that seems relatively straightforward, would like to see someone try this with fio. What do you think?

bengland2 commented 2 years ago

after talking to Raul, I decided to restructure this so that the benchmark-wrapper creates a new index with the post-processed results of the run. I only run this step in client-1 for smallfile and fs-drift, and it gets the results from all pods just the way these programs do it. This will enable benchmark-comparison repo to operate on the results also. Also I can make a smallfile dashboard to display the results, similar to the fio dashboard. Perhaps I'll try to use jsonnet to do it, so we can have source code for the dashboard (i.e. make the dashboard usable by other sites outside of Red Hat).

bengland2 commented 2 years ago

Sorry this is taking so long. Change in design:

Instead of somehow trying to wedge the result analysis into client-1, I'm going to wait until all clients are complete and then I can basically run the *_wrapper/analyze_test_results.py program previously in this PR, only instead of printing results it will generate a new elasticsearch index, smallfile-analyzed-results (like fio-analyzed-results).
To wait until all workload pods are done, I can just add a task to the end of benchmark-operator/roles/my-benchmark/tasks/main.yml since at that point all workload pods are in either Completed or Failed state, right? If the latter, no results will be generated. This is what roles/benchmark_state/tasks/completed.yml does in benchmark-operator, right?
to enable benchmark-comparison to do a statistically valid comparison of results, I'll post the samples in the analyzed index (in a separate document field), not just the mean and std. dev. This way, we can add future code to benchmark-comparison that can do statistical test that the user specifies, like t-test or u-test or any other test that anyone can think of.

cloud-bulldozer / benchmark-wrapper

Elasticsearch result analysis programs #355