Combining tests and benchmarks

When trying to incrementally optimize and benchmarking a function, the benchmark and the regression test often use very similar code.

hashTriangleTest =
    let
        toList ( a, b, c ) =
            [ a, b, c ]
    in
        Test.fuzz fuzzTriangle "hash triangle behaves as before" <|
            \triangle ->
                AdjacencyList.hashTriangle triangle
                    |> toList
                    |> Expect.equal (hashTriangleOld triangle)

hashTriangleBenchmark =
        describe "SweepHull"
            [ -- nest as many descriptions as you like
              Benchmark.compare "sharesEdge shared"
                "old"
                (\_ -> AdjacencyList.hashTriangleOld triangle)
                "new"
                (\_ -> AdjacencyList.hashTriangle triangle)
            ]

Besides that fuzzing can find performance bottlenecks that would otherwise be missed (see also #3), testing the benchmark (or benchmarking the test) gives free tests/benchmarks. It has happened to me that I made a mistake in the benchmark code and performance was looking way better than it actually was.

For completeness, here is my message from the slack

I've been incrementally optimizing an algorithm (for delaunay triangulation) using elm-benchmark and have some feedback.

My typical workflow for improving a function looks something like

write an improved (hopefully faster) version of the same function

write a test (often fuzz test) to check your new implementation is equivalent

write a benchmark to check that the new implementation is actually faster

remove the old code

remove the equivalency test (or maybe put the old version in the tests)

remove the benchmark

Most of these steps are tedious and repetitive. Editor integration/code generation could help make this better, but another improvement would be to combine the benchmark and the test. Additionally, for more complex functions, the performance between easy and difficult inputs can vary a lot, so benchmarking on a diverse set of inputs gives more accurate results. Is this something you've thought about?

Some other thoughts:

are the primitives available to write non-micro benchmarks?

UI that can start/cancel a particular benchmark (I'm not a big fan of running on pageload)

elm-explorations / benchmark

Combining tests and benchmarks #10