Closed CleanCut closed 2 years ago
As a utility whose name begs to be used alongside if not replace head
and tail
, I think the "benchmark" that's more anticipated in our introduction would be how does it fare against the OG coreutils counterparts.
For example, as a reader I want to know how
headtail somebigfile.txt
compares (after rigorous experiments) against
head somebigfile.txt && tail somebigfile.txt
Or headtail somebigfile.txt -H 25 -T 0
against head somebigfile.txt -n 25
?
(Unless I misunderstood "when we make changes we can do a before/after comparison. See criterion.", which to me sounds like using the micro-benchmarking tool to do regression testing.)
Those all sound good to me!
Those all sound good to me!
Awesome. Working on it! 🤓
I found about hyperfine while researching about how such benchmarking could be done. https://github.com/uutils/coreutils/blob/main/src/uu/head/BENCHMARKING.md
There are a couple of options that we could do imo.
====
This is what hyperfine outputs for me:
hyperfine --warmup 3 "head tests/files/input.txt && tail tests/files/input.txt" "target/release/headtail tests/files/input.txt"
Benchmark 1: head tests/files/input.txt && tail tests/files/input.txt
Time (mean ± σ): 5.4 ms ± 0.8 ms [User: 2.0 ms, System: 1.8 ms]
Range (min … max): 4.9 ms … 8.8 ms 263 runs
Benchmark 2: target/release/headtail tests/files/input.txt
Time (mean ± σ): 7.3 ms ± 0.9 ms [User: 4.7 ms, System: 1.8 ms]
Range (min … max): 6.6 ms … 10.9 ms 280 runs
Summary
'head tests/files/input.txt && tail tests/files/input.txt' ran
1.34 ± 0.25 times faster than 'target/release/headtail tests/files/input.txt'
... which leads to this philosophical question: what is the selling point of headtail? Is being at least as fast as GNU coreutils counterparts a non-goal? What do you think? @CleanCut
In any case, that meaning of benchmark should be a separate github issue (if regarded valid at all). My next pull-request will be adding criterion for before/after change comparison ✌️.
... which leads to this philosophical question: what is the selling point of headtail?
For me, it is purely the ability to be able to head AND tail in a simultaneous operation.
I'm not opposed to attempting to optimize performance as a secondary goal, but I sincerely doubt many people will ever be searching for a combined head and tail utility because either head or tail is too slow.
Actually, I see performance as more of a tertiary goal, with other useful features being the secondary goal (like following a file while tailing -- which I didn't actually need, but was fun to do).
We should add some benchmarks so that when we make changes we can do a before/after comparison. See criterion.
These benchmarks should not be run in CI, since CI is too noisy of an environment to be useful for benchmarking (unless we somehow gain dedicated self-hosted runners for the purpose)