unpredictable value units, parsing nightmare

glenntanner3 commented 7 years ago

Being a human, the auto scaling of values into the easiest to read unit is nice; however, trying to script and parse the results is a nightmare. Would it be possible to add a switch to define which unit or force to the minimal unit? This way when a script parses the results the unit does not have to be detected and interpreted, rather it could be used as a key. IOPS=67.6k -> IOPS=67600 lat (msec) -> lat (usec) BW=264MiB/s -> BW=264000KiB/s

sitsofe commented 7 years ago

@glenntanner3 I think you might be trying to scrape output meant for humans rather than using the output designed for machines. See http://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-output-format for a list of the formats and https://www.youtube.com/watch?v=vm1GJMp0QN4#t=17m48s for the issue you avoided but others sadly did not.

If this helps can you close this issue?

glenntanner3 commented 7 years ago

@sitsofe Great video, SED was too funny; however, even tarse will still adjust the units. I'm working around it, but the code to covert it is so ugly and while manually reviewing I also over looked usec and msec and made some mistakes on reports. Therefore being able to disable this conversion is what i'm really looking for. Thank you for your suggestion.

szaydel commented 7 years ago

I think four times now I started working on producing a CSV output which I thought would work for me and the way our company handles this data, but each time something gets in the way and when I come back to it a couple of months later I forget everything from last time, try to re-remember and start working on it again only to repeat the same behaviour over again. Maybe once I will actually complete this effort. @glenntanner3: is this something you would benefit from?

glenntanner3 commented 7 years ago

@szaydel perhaps, are you speaking of just making an output CSV converter or actually update FIO to output non-human readable outputs? You know, I really would be fine with only the output file having the scriptable values. If i find the time I might just try to make the modification myself to FIO. It annoys me that much.

szaydel commented 7 years ago

@glenntanner3: I was specifically talking about adding an option to fio to output CSV. Arguably CSV is still quite human-readable, but that's certainly debatable.

glenntanner3 commented 7 years ago

CSV would be great, again most important is that it uses a single value unit. Leave it to the script to convert as desired.

sitsofe commented 7 years ago

Wait, why don't you use the terse output format - http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-output-format , http://fio.readthedocs.io/en/latest/fio_doc.html#terse-output? I'd recommend json over it myself but why is that insufficient?

szaydel commented 7 years ago

I think it is pretty much a requirement to do a baseline unit, probably bytes when producing CSV output. I find that having units is rarely beneficial with machine parsable output. Pretty much any time you have a column of bytes a vector operation easily transforms that data to whatever unit you want to operate on.

szaydel commented 7 years ago

@sitsofe: I find that I operate more often than not in SQL-like fashion and so having CSV for me is most convenient. This does not necessarily mean others are working in the same way. I often will process data with R, and R makes it trivial to work with CSV and SQL and allows for rapidly moving CSV to SQL.

sitsofe commented 7 years ago

@szaydel Well OK but I still don't follow about the need for an additional CSV mode for fio. When using the terse output format (that outputs CSV) aren't the units fixed?

sitsofe commented 7 years ago

(Also R seems to have libraries that make converting JSON to a data frame straightforward - https://stackoverflow.com/questions/2617600/importing-data-from-a-json-file-into-r )

sitsofe commented 7 years ago

@glenntanner3 - how have you found fio's terse output?

szaydel commented 7 years ago

Terse output suffers from one issue in my opinion. The output is effectively CSV, correct, but it is missing column headings, so one has to make sure one packs these along in some form, otherwise it takes a while to figure out what each field is. A proper CSV file is assumed to have a header generally, so that you can just drop the file into anything and query by column name with minimal effort. I am sure I am coming across as a lazy bum right now. :)

Version changes of the terse format also introduced changes to columns and that can cause minor issues when you don't have headings with the data, but do come back to the data at some later point. This happens to me sometimes, where I have to come back to data after a few years. I also realize this may sound strange.

sitsofe commented 7 years ago

@szaydel Re column headings - CSV has never really had a spec (and the requirement for headings has always been optional in all the CSV parsers I've seen). Wikipedia also suggests headings are optional - https://en.wikipedia.org/wiki/Comma-separated_values#Standardization . These days the fio CSV headings are listed in the HOWTO for v3 terse (which is the most common version). If you want self documenting data then you really should be going to the JSON output :-)

I thought that the different versions of the terse format had to be explicitly chosen and the version used is actually in one of the columns. Are there other changes you're thinking of? BTW this is starting to get off topic for the original issue and should probably be moved to the mailing list.

At any rate the "normal" output of fio is meant for human consumption and changes at will between different versions of fio. It is really unwise to scrape it for further processing purposes and it doesn't try to be easily parsable (as you found things like units change dynamically). The other output formats use a fixed unit and are far more amenable to machine parsing/processing and try not to change in incompatible ways between fio versions.

glenntanner3 commented 7 years ago

@sitsofe From what I had found (google), it seemed to me that tarse still changed units. But I will give it a try. BTW i'm only parsing the output file, not the STDOUT output... but without a format type i suppose 'normal' is equivalent to STDOUT.

sitsofe commented 7 years ago

@glenntanner3 I'm not aware that terse changes units. It looks like it is using a fixed unit to me in show_ddir_status_terse() - https://github.com/axboe/fio/blob/07dff7d1d614b33e3a6d3e3ade38ce648b53a632/stat.c#L881 .

The "location" of the output (see http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-output ) is separate from the "type" of the output (see http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-output-format ). fio defaults to normal output on stdout but you can specify other combinations.

szaydel commented 7 years ago

@sitsofe: Off-topic indeed. I don't disagree about JSON, but to each their own I suppose. I have no issues with JSON, but CSV is my preferred format. You are absolutely correct about lack of a standard, sans this bit: https://tools.ietf.org/html/rfc4180. If I have something thoughtful to add, I will either raise an issue here, or mailing list. Thanks for consuming and offering feedback.

sitsofe commented 7 years ago

@glenntanner3 PS if you find find that the units are fixed with different output formats can you close this issue? Thanks!

glenntanner3 commented 7 years ago

Ok i can live with terse, a bit cumbersome, but good. Might be useful to note in documentation that terse uses static values and a in page link to the list of outputs.

bash command for the next person who wants to get the field number for terse output. echo "terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util" | tr ';' '\n' | awk '{printf("%03d %s\n", NR, $0)}'

sitsofe commented 7 years ago

@glenntanner3 Re "note in documentation that terse uses static values and a in page link to the list of outputs": that's a good point. Would you like to submit a patch to https://github.com/axboe/fio/blob/master/HOWTO and https://github.com/axboe/fio/blob/master/fio.1 for this?

glenntanner3 commented 7 years ago

Sure thing

On Sep 9, 2017 00:45, "Sitsofe Wheeler" notifications@github.com wrote:

@glenntanner3 https://github.com/glenntanner3 Re "note in documentation that terse uses static values and a in page link to the list of outputs": that's a good point. Would you like to submit a patch to https://github.com/axboe/fio/blob/master/HOWTO and https://github.com/axboe/fio/blob/master/fio.1 for this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/axboe/fio/issues/451#issuecomment-328254174, or mute the thread https://github.com/notifications/unsubscribe-auth/AKkIlS5uPPUlMR9Y8nVtNaYonki774Eoks5sghfTgaJpZM4PP3GN .

axboe / fio

unpredictable value units, parsing nightmare #451