Closed kindaro closed 3 years ago
It would be nice, but probably too complicated for tasty-bench
to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?
I was thinking about implementing something in spirit of bcompare
, which mysteriously disappeared starting from criterion-1.0.0.0
. It could be a good step moving away from absolute numbers in benchmarks.
We can phrase it differently: is the given function distinguishable from a given asymptote up to a constant multiple within some realistic range of inputs? Of course a pathological example like f (n) + (εn)! may always be invented such that up to a large enough n it looks like f and then suddenly explodes. But in such cases we may as well not have a bench mark in the first place.
It would be nice, but probably too complicated for
tasty-bench
to accomplish. Is there a Haskell library (snippet / function / whatever), which can guess a form of a function given a list of tabulated values? Is it feasible to distinguish O(n) from O(n log n)?
I am not aware of any ready made solution in Haskell (although a quick search reveals prior art in other settings). But how hard can it be?
This can be flaky since some classes are very similar. But we do not really need this much. We only need to tell if a function plausibly may be in a given class, up to this much error. We really only want to automate and make precise the judgement that we already routinely make with a naked eye. And we already put bench marks in a test suite — it is a logical next step to make them actually detect failures.
I understand that this would be experimental and anything can go wrong. And we do not have any doctors of statistics on the pay roll. So maybe you are right and we should not even try.
It does not take a PhD in statistics, but probably involves opinionated design choices and extra dependencies. tasty-bench
is a good citizen of tasty
ecosystem, so one can potentially write another TestReporter
, which will perform this kind of analysis. Additional metadata can be passed via dummy pattern in after
. I'm interested to see how this can play out, because I had similar desires in the past.
Actually, even a standalone package, which can guess asymptotics of a function, would be nice to have.
I made some progress with bcompare
atop of tasty
facilities for test dependencies. Documentation is still absent, but here is an example of usage: https://github.com/Bodigrim/mod/commit/cb4b737e7d76993bfa3e55264e165c29bd1199e6 It generates a report, comparing benchmark results with each other:
Sum
Data.Mod: OK (1.46s)
483 ms ± 41 ms
Data.Mod.Word: OK (2.18s)
145 ms ± 8.2 ms, 0.30x
finite-field: OK (12.76s)
4.26 s ± 133 ms, 8.81x
finite-typelits: OK (13.34s)
4.46 s ± 254 ms, 9.25x
modular-arithmetic: OK (8.63s)
2.87 s ± 91 ms, 5.95x
modular: OK (45.02s)
6.53 s ± 543 ms, 13.52x
With bcompare
the original example can look like below, making it easier to guesstimate algorithmic complexity:
All
folds
foldl'
1: OK (3.86s)
14 ns ± 434 ps, 0.00x
2: OK (4.87s)
18 ns ± 500 ps, 0.00x
4: OK (14.16s)
26 ns ± 586 ps, 0.01x
8: OK (6.25s)
46 ns ± 1.5 ns, 0.01x
16: OK (11.60s)
86 ns ± 1.8 ns, 0.02x
32: OK (11.08s)
165 ns ± 4.8 ns, 0.04x
64: OK (4.95s)
292 ns ± 8.7 ns, 0.07x
128: OK (9.77s)
582 ns ± 12 ns, 0.14x
256: OK (9.34s)
1.1 μs ± 34 ns, 0.26x
512: OK (9.43s)
2.2 μs ± 70 ns, 0.53x
1024: OK (4.45s)
4.2 μs ± 132 ns
2048: OK (19.41s)
8.9 μs ± 313 ns, 2.11x
4096: OK (4.64s)
18 μs ± 654 ns, 4.15x
8192: OK (9.36s)
35 μs ± 677 ns, 8.38x
16384: OK (4.68s)
71 μs ± 2.3 μs, 16.90x
32768: OK (9.45s)
144 μs ± 3.1 μs, 34.11x
65536: OK (37.45s)
283 μs ± 3.5 μs, 67.20x
So, it is comparing everything to the value at 1024? Nice!
But there are 2 drawbacks:
awk
occasionally but this is, like, anti-Haskell.So, it is sort of great, but also sort of not.
tasty
uses awk
expressions for references between tests. The output above was generated from bgroup "foldl'" $
map (\s -> (if S.length s == 1024 then id else bcompare "$NF == \"1024\" && $(NF-1) == \"foldl'\"") $
bench (show $ S.length s) $
nf (S.foldl' (\acc x -> acc + fromIntegral x) (0 :: Int)) s) foldInputs
This is arguably messy, because we stretch tasty
to its limits. It would be nice to provide more ergonomical combinators.
I came here with the same feature request :-)
I think what's missing in tasty-bench is a way to time something without wrapping it in a TestTree
/Benchmark
.
Something like measure :: Benchmarkable -> IO Timings
, where Timings
would contain the summary statistics (mean, sd) as well as the raw measurements that were used to come up with these statistics.
Then downstream packages could use this to do some more intelligent analyses. (I'd have a go at one.)
@kindaro: see here for the explanation of why awk is used for patterns in tasty.
@UnkindPartition what if I expose something like this? https://github.com/Bodigrim/tasty-bench/blob/1aac590eae5ed6259eed44a7403403a3b07d26be/Test/Tasty/Bench.hs#L788-L790 Would it be enough?
It would also be nice to have a way to estimate a "good enough" number of runs, as described in the "Statistical model" section of the README. Otherwise that logic would have to be duplicated between packages.
@UnkindPartition how about this? https://github.com/Bodigrim/tasty-bench/blob/ee6b2d6eb2686f4020ebc06b8ce43fd15dc50d67/Test/Tasty/Bench.hs#L847-L850
This looks good. Would be good to document what the timeout does, exactly. (Would it use the max number of iterations that fit into the timeout even if RelStDev is not satisfied? Would it throw an exception?)
Timeout takes a soft priority over RelStDev. No exceptions are thrown. I've elaborated a bit more in haddocks: https://github.com/Bodigrim/tasty-bench/blob/06fc7c1e2706942c4581e13d28d7ac9c901a7b11/Test/Tasty/Bench.hs#L881-L889
Does this get to be wielded automatically when a bench mark runs in the presence of saved results of the previous bench mark, so that the results of a bench mark can be committed to version control, thereby at once turning it into a true test suite? Also, can we have default name for bench mark files, so that cabal test
but works? Or is this a fantasy yet?
@kindaro not sure what exactly you refer by "this". Since measureCpuTime
is now publicly exposed, one can write a tasty
provider to detect and check asymptotic complexity. I think that such provider is unlikely to be merged into tasty-bench
itself, but the beauty of tasty
is that you can freely combine providers and ingredients at your own discretion.
You can specify default name for CSV report like this:
import Data.Maybe
import Test.Tasty.Bench
import Test.Tasty.Ingredients
import Test.Tasty.Options
import Test.Tasty.Runners
main = do
opts <- parseOptions benchIngredients benches
let opts' = changeOption (Just . fromMaybe (CsvPath "foo.csv")) opts
case tryIngredients benchIngredients opts' benches of
Nothing -> print "no ingredient"
Just mb -> do
b <- mb
print $ if b then "Success" else "Failure"
For example, consider the following output:
What happens here is the same function is measured with exponentially increasing input. It looks from the numbers as though its time complexity is linear.
Can we make a magical automated check out of these numbers?