elm-explorations / benchmark

BSD 3-Clause "New" or "Revised" License
26 stars 4 forks source link

Combining tests and benchmarks #10

Open folkertdev opened 6 years ago

folkertdev commented 6 years ago

When trying to incrementally optimize and benchmarking a function, the benchmark and the regression test often use very similar code.

hashTriangleTest =
    let
        toList ( a, b, c ) =
            [ a, b, c ]
    in
        Test.fuzz fuzzTriangle "hash triangle behaves as before" <|
            \triangle ->
                AdjacencyList.hashTriangle triangle
                    |> toList
                    |> Expect.equal (hashTriangleOld triangle)

hashTriangleBenchmark =
        describe "SweepHull"
            [ -- nest as many descriptions as you like
              Benchmark.compare "sharesEdge shared"
                "old"
                (\_ -> AdjacencyList.hashTriangleOld triangle)
                "new"
                (\_ -> AdjacencyList.hashTriangle triangle)
            ]

Besides that fuzzing can find performance bottlenecks that would otherwise be missed (see also #3), testing the benchmark (or benchmarking the test) gives free tests/benchmarks. It has happened to me that I made a mistake in the benchmark code and performance was looking way better than it actually was.

For completeness, here is my message from the slack

I've been incrementally optimizing an algorithm (for delaunay triangulation) using elm-benchmark and have some feedback.

My typical workflow for improving a function looks something like

  • write an improved (hopefully faster) version of the same function
  • write a test (often fuzz test) to check your new implementation is equivalent
  • write a benchmark to check that the new implementation is actually faster
  • remove the old code
  • remove the equivalency test (or maybe put the old version in the tests)
  • remove the benchmark

Most of these steps are tedious and repetitive. Editor integration/code generation could help make this better, but another improvement would be to combine the benchmark and the test. Additionally, for more complex functions, the performance between easy and difficult inputs can vary a lot, so benchmarking on a diverse set of inputs gives more accurate results. Is this something you've thought about?

Some other thoughts:

  • are the primitives available to write non-micro benchmarks?
  • UI that can start/cancel a particular benchmark (I'm not a big fan of running on pageload)
BrianHicks commented 6 years ago

thanks for opening this! I'd love to find an API where this would be possible. Maybe zero in on the worst case performance or the best case somehow?