Bodigrim / tasty-bench

Featherlight benchmark framework, drop-in replacement for criterion and gauge.
https://hackage.haskell.org/package/tasty-bench
MIT License
80 stars 11 forks source link

Format of the CSV file #16

Closed harendra-kumar closed 3 years ago

harendra-kumar commented 3 years ago

Wanted to discuss a few minor details about the CSV file format.

  1. Would it be better to not have spaces in the names of the columns?
  2. Instead of "Mean", "cpuTime" may be a more informative choice.
  3. Also, why not keep the units of time as seconds instead of ps?
  4. Instead of Copied, gcBytesCopied would be more informative.
  5. Can we store the whole series of measurements in the CSV file, instead of just one data point? Like in gauge, we can have an iterations column reporting the number of iterations and rest of the columns report raw data corresponding to those many iterations as usual. This will allow other tools to do any statistical analysis over the whole series of measurements.
Bodigrim commented 3 years ago
  1. Spaces do not have any special meaning in CSV, so I do not see a special justification to avoid them.
  2. Agreed.
  3. I do not see how s are inherently better than ms or μs or ns or ps. Dumping picoseconds means that we do not have to deal with floating point numbers (and their parsing).
  4. Agreed.
  5. Since tasty-bench both prints and reads CSV, it would be inconvenient to have more than one line per benchmark. I can think about exposing more internals, so that an external client could measure a benchmark with a given number of iterations directly, without a medium of CSV file.
harendra-kumar commented 3 years ago
  1. secs is as arbitrary as any other unit. When using ps we are assuming we would never have to represent time at a lower granularity. If we use seconds, though we deal with floating point numbers we do not assume a precision. Also, secs is the unit being used in gauge and criterion, so we would not have to change anything in our analysis tools if it were secs.
  2. gauge provides a --csvraw option which dumps per sample data vs the --csv option which combines iterations into one single data point. Would something like that be a possible way to deal with this? I am not stuck because of this, so this is just a suggestion and not a pain point.
Bodigrim commented 3 years ago
  1. I don't want to look stubborn, but switching from picoseconds to seconds is a breaking change. How important is it for you? I kinda feel that compatibility with CSV format of criterion or gauge is a lost cause anyways.

  2. Generating two incompatible CSV reports is confusing: nothing prevents a user to generate --baseline with a hypothetical --csvraw instead of --csv.

    My resistance is partly caused by current architecture, which makes dumping raw samples difficult. But I also think CSV is a poor format for interprocess communications and a likely source of future compatibility issues. I'd prefer to expose more internals, so that clients were able to roll out their own statistical analysis communicating in-process.

harendra-kumar commented 3 years ago

How important is it for you?

Not critical, but nice to have for compatibility. Until tasty-bench is tested and becomes reliable for our existing benchmarking infrastructure we would like to keep gauge too as a backup option. For that we will have to use separate handling in the analysis/reporting for both the tools, or maybe preprocess the csv file generated by tast-bench to convert the column to seconds.

Regarding (4), in general, a persistent file is a good interface for such purposes, CSV or whatever. We may not want to perform the statistical analysis in real time, instead we may want to process the raw data offline at any time to produce different reports or presentations of the data. For that, some persistent format to store the data is required.

The easiest possible way could be to store all the data points in the CSV. The internal tasty-bench analysis can just use the last two points for each benchmark to calculate its results. Anyway, as I said earlier this is not critical for me as of now, but something to consider/discuss.

Bodigrim commented 3 years ago

I took a closer look at CSV reports of criterion and gauge. It appears that headers in --csv and --csvraw modes are imcompatible. For example, the former (both in criterion and in gauge) names a column Mean, while the latter uses time or cpuTime. If there were an appetite for a change, I'd rather conform to the more prevalent format; I guess --csvraw is rarely used without preprocessing.

I've added incantations to fake both formats (headers + measurements in seconds) at the bottom of https://github.com/Bodigrim/tasty-bench#comparison-against-baseline section.


Once you have a Benchmarkable object, you can explode it and write your own driver to run required metrics with required number of iterations and report them in a required format. If your goal is just to collect raw samples, you cannot really benefit much from defaultMain and tasty framework.

Bodigrim commented 3 years ago

Another option is to generalize csvReporter, so that clients could specify desired format. Upd.: But this is problematic, because we would not be able to parse data back to perform comparison against baseline.

Bodigrim commented 3 years ago

To sum up, matching --csvraw columns is a low priority, because it is rarely consumed by humans without preprocessing. Matching --csv is more sensible, but because of a different statistical model we can only fake it, so not a huge win from synchronizing column names. It could create a dangerous illusion that CSV reports between different frameworks are comparable. The situation with streamly is very much unique, I believe; I have not seen any other Haskell project with such an extensive benchmark suite and harness.