Closed harendra-kumar closed 2 years ago
I think something's wrong with your benchmark itself: the first iteration takes ~2 seconds, while each following is < 1 ms.
Using tasty-bench-0.3
, this is just a single iteration:
$ cabal run --constraint='tasty-bench >= 0.3' Benchmark.System.Process -- --stdev Infinity
All
processChunks tr: OK (1.88s)
1.872 s
Next tasty-bench
measures 1 iteration (t1 = 2) and 2 iterations (t2 < 0.001) and returns (t1 + 2 t2) / 5. Stdev is really huge here (16499266 1000 / 2 / 377 * 100 = 2e9 percents), but since we were targeting 1e12, no more iterations are evaluated:
$ cabal run --constraint='tasty-bench >= 0.3' Benchmark.System.Process -- --stdev 1e12
All
processChunks tr: OK (1.90s)
377 ms ± 16499266.09 s
To conform with a smaller stdev, we need to measure 2 and 4 iterations. Both of them are in millisecond range, the result is in millisecond range also:
$ cabal run --constraint='tasty-bench >= 0.3' Benchmark.System.Process -- --stdev 1e6
All
processChunks tr: OK (2.11s)
452 μs ± 852 μs
Results from gauge
are similar, because AFAIR it skips the very first iteration, and measures only subsequent ones.
I know, its probably because of not rewinding the file handle.
Is it possible to generate a warning in such cases? The error margin reported was not bad (312 μs ± 765 μs
above and much smaller in many other runs) so it did not raise any alarms. Also, it will be great to have some debug or verbose option to report more details that can help finding the issue.
Not sure how to trigger a warning here: it is expected for the very first iteration to take longer because of forcing thunks. Debug mode could be helpful indeed, need to think how to wire it in better.
Maybe we can always report the difference between the first iteration and the second iteration to give an idea of the cost of one time evaluations. If it seems too high to the user based on their knowledge of the benchmarked code they can decide to investigate.
I've added a package flag to emit all ongoing results to stderr.
I am using tasty-bench to benchmark a piece of code that forks a process. Not sure how it interacts with the benchmarking process. But I get figures that are wildly off. See below, the figures reported by "tasty-bench", "time" and "+RTS -s":
There is some more code outside the benchmarks which is contributing to the "+RTS -s" and "time" figures but even after accounting for that figures are wildly off.
The code to reproduce this is available here: https://github.com/composewell/streamly-process/tree/tasty-bench-issue .
Even gauge has the same issue. Maybe this is some fundamental issue with this type of code? Or am I doing something wrong in the measurements/code?