distributed-system-analysis / pbench

A benchmarking and performance analysis framework
http://distributed-system-analysis.github.io/pbench/
GNU General Public License v3.0
188 stars 108 forks source link

Generate charts for Fio benchmark #3549

Closed MVarshini closed 1 year ago

MVarshini commented 1 year ago

PBENCH-1214

Generate charts for Fio benchmark

MVarshini commented 1 year ago

@dbutenhof I was pointing to the latest version of pquisby and didn't check in that file.

Since, disk-job value is unknown <>_d-<>_j-<>_iod left empty in the response

dbutenhof commented 1 year ago

@dbutenhof I was pointing to the latest version of pquisby and didn't check in that file.

Hmm. So 0.0.25 is working for you? Because it wasn't for me, on 2 of the three fio datasets, which makes me nervous. Was there something else in your test environment that didn't get onto the PR branch, aside from the requirements.txt change? 😦

Since, disk-job value is unknown <>_d-<>_j-<>_iod left empty in the response

That's rather ugly, but it sounds like maybe that's some of the extra information Soumya was hoping to get from the server? (Although I'm not sure we know how to find it.)

sousinha97 commented 1 year ago

@dbutenhof , In order to analyse Fio benchmark data, knowing the following is important apparently(I am not an expert on FIO)-

  1. No of disks used
  2. IODepth
  3. Numjobs

which we are adding to the graphs, if it's not possible to fetch this data, we can remove this from the graph (It would have been great if we were able to extract this data tho, it reduces the effort for the user to go back to their runs and check manually the value of these fields). Would love to know your views on this. Accordingly we can plan and modify the graphs.

I guess these values are provided while running fio command, if not, default values are taken into consideration.

dbutenhof commented 1 year ago

@dbutenhof , In order to analyse Fio benchmark data, knowing the following is important apparently(I am not an expert on FIO)-

We're definitely not FIO experts, either. We've pieced together how to run it in trivial configurations, but the data in the result.csv that Quisby consumes was determined years ago as part of a complicated Pbench Agent post-processing step we've barely touched.

  1. No of disks used

I assume we could figure out how to extract this information from the more detailed benchmark logs ... but I'm not even sure if it's global configuration or private to each iteration. (Or, for that matter, whether the Pbench "iteration" concept, which is how the Agent organizes the raw data, directly corresponds to the fio "job" configuration...)

Each fio "job" seems to be targeted to a specific filesystem path, which would define the disk used. Since the result.csv lists each Pbench Agent "iteration" (which I think equates directly to a fio "job" in the standard benchmark wrapper, but don't quote me on that), I suspect that each job is a single disk.

But a lot of these constraints are due to the design of the pbench-fio wrapper script. Right now, Quisby relies on the post-processed result.csv file which means we can only visualize/compare the output of pbench-fio. We also have people who run fio directly, for example, using pbench-user-benchmark, often to take advantage of the massive flexibility of the fio command configuration that's not supported by the Pbench Agent wrapper. Right now we have no way to visualize/compare those runs as we don't even know it's "fio" and we don't have the agent post-processing.

  1. IODepth

This appears to be a global configuration setting on the fio job file, and we can extract it from a fio-generated JSON output file or from the raw text job description file.

  1. Numjobs

As I mentioned above, I'm not entirely certain how the pbench-fio wrapper maps the fio "job" concept into Pbench Agent "iterations". I suspect (without much proof at this point) that it's one-to-one. However, a fio configuration file can apparently define multiple "jobs". We can read the list from the input file or fio's output to get it.

which we are adding to the graphs, if it's not possible to fetch this data, we can remove this from the graph (It would have been great if we were able to extract this data tho, it reduces the effort for the user to go back to their runs and check manually the value of these fields). Would love to know your views on this. Accordingly we can plan and modify the graphs.

I guess these values are provided while running fio command, if not, default values are taken into consideration.

We can figure this all out, but first we need to figure out the relative importance of figuring it out compared to the other stuff we need to do. 😄

FYI: I've summarized this at PBENCH-1274.

webbnh commented 1 year ago

Just poking around a randomly-selected FIO result, I found the generate-benchmark-summary.cmd at the top level which contains

/opt/pbench-agent/bench-scripts/postprocess/generate-benchmark-summary "fio" "--block-sizes=4,1024 --iodepth=8 --numjobs=10 --ramptime=10 --runtime=60 --samples=5 --targets=/fio --test-types=read,write --clients=192.168.122.211" "/var/lib/pbench-agent/fio__2023.09.06T12.05.46"

It looks like the values for iodepth and numjobs are there, and, extending Dave's suggestion above, I expect that the number of disks can be gleaned from the targets value there.

If we really want to dig, we have the fio.job files for each iteration, each of which contains a global section with the iodepth and a job-<dev> section for each device (which we can count to get the number of devices) each of which contains the number of jobs.

However, if (for the moment) we want to restrict ourselves to the information that we already have, I think that the Pbench Dashboard can glean the number of disks from the Pbench Server metadata, dataset.metalog.iterations/<iteration>.dev (i.e., in the metadata.log file, under each [iterations/...] section, there is a dev entry which contains the same value as the targets in the command above). And, I think Quisby can deduce the number of jobs by counting the iops_sec:client_hostname:* columns in the result.csv file. However, I don't see any way to get the I/O depth from the information that we already have on hand.

[I've added this information in a reply to PBENCH-1274.]

sousinha97 commented 1 year ago

@dbutenhof @webbnh thanks for your input. Yes this process requires a bit of discussion, we can surely look into other major issues first and keep this in backlog as an optimisation task. I will contact with Varshini, and remove this feature for now.

dbutenhof commented 1 year ago

No match for argument: rsyslog-mmjsonparse Error: Unable to find a match: rsyslog-mmjsonparse

That's distressing. I'm not sure whether this is a transient failure in accessing repos or a real change in the dependencies. I'm going to re-trigger the build to see if it happens again.