Add benchmarks - Githubissues

CleanCut commented 2 years ago

We should add some benchmarks so that when we make changes we can do a before/after comparison. See criterion.

These benchmarks should not be run in CI, since CI is too noisy of an environment to be useful for benchmarking (unless we somehow gain dedicated self-hosted runners for the purpose)

tbmreza commented 2 years ago

As a utility whose name begs to be used alongside if not replace head and tail, I think the "benchmark" that's more anticipated in our introduction would be how does it fare against the OG coreutils counterparts.

For example, as a reader I want to know how

headtail somebigfile.txt

compares (after rigorous experiments) against

head somebigfile.txt && tail somebigfile.txt

Or headtail somebigfile.txt -H 25 -T 0 against head somebigfile.txt -n 25?

(Unless I misunderstood "when we make changes we can do a before/after comparison. See criterion.", which to me sounds like using the micro-benchmarking tool to do regression testing.)

CleanCut commented 2 years ago

Those all sound good to me!

tbmreza commented 2 years ago

Those all sound good to me!

Awesome. Working on it! 🤓

tbmreza commented 2 years ago

I found about hyperfine while researching about how such benchmarking could be done. https://github.com/uutils/coreutils/blob/main/src/uu/head/BENCHMARKING.md

There are a couple of options that we could do imo.

Reference that link and/or add "Benchmarking" section to our readme
Compile a table of numbers (a la https://github.com/BurntSushi/ripgrep#quick-examples-comparing-tools)

====

This is what hyperfine outputs for me:

hyperfine --warmup 3 "head tests/files/input.txt && tail tests/files/input.txt" "target/release/headtail tests/files/input.txt"
Benchmark 1: head tests/files/input.txt && tail tests/files/input.txt
  Time (mean ± σ):       5.4 ms ±   0.8 ms    [User: 2.0 ms, System: 1.8 ms]
  Range (min … max):     4.9 ms …   8.8 ms    263 runs

Benchmark 2: target/release/headtail tests/files/input.txt
  Time (mean ± σ):       7.3 ms ±   0.9 ms    [User: 4.7 ms, System: 1.8 ms]
  Range (min … max):     6.6 ms …  10.9 ms    280 runs

Summary
  'head tests/files/input.txt && tail tests/files/input.txt' ran
    1.34 ± 0.25 times faster than 'target/release/headtail tests/files/input.txt'

... which leads to this philosophical question: what is the selling point of headtail? Is being at least as fast as GNU coreutils counterparts a non-goal? What do you think? @CleanCut

tbmreza commented 2 years ago

In any case, that meaning of benchmark should be a separate github issue (if regarded valid at all). My next pull-request will be adding criterion for before/after change comparison ✌️.

CleanCut commented 2 years ago

... which leads to this philosophical question: what is the selling point of headtail?

For me, it is purely the ability to be able to head AND tail in a simultaneous operation.

I'm not opposed to attempting to optimize performance as a secondary goal, but I sincerely doubt many people will ever be searching for a combined head and tail utility because either head or tail is too slow.

Actually, I see performance as more of a tertiary goal, with other useful features being the secondary goal (like following a file while tailing -- which I didn't actually need, but was fun to do).

CleanCut / headtail

Add benchmarks #19