Bench modules (ExUnit-style benchmarks definitions) (DSL)

I am thinking about writing support for ExUnit-style benchmarks (if there is no work toward such thing yet). And I am having something like that in mind:

defmodule SampleBench do
  use Benchee.Unit

  bench do
    input "Small (1 Thousand)", do: Enum.to_list(1..1_000)
    input "Middle (100 Thousand)", do: Enum.to_list(1..100_000)

    setup_all input do
      {Enum.shuffle(input), fn i -> [i, i + 1] end}
    end

    func "flat_map", {input, func} do
      Enum.flat_map(input, func)
    end

    func "map.flatten", {input, func} do
      input |> Enum.map(func) |> List.flatten
    end
  end
end

Any suggestions or other propositions for the syntax?

:wave:

Hi there and thanks for checking in :) Sometimes people ask about this, so it's also somewhere on my "fun" agenda for after 1.0. Hence I've given this some thought in the past :)

I'd keep the naming closer to what the options in benchee are named, for familarity sake :) What is called func there we usually call job. I'm not sure what setup_all maps to in benchee. I'd think that it would be executed once before all of them but I guess it maps to what we call before_scenario
Would be interesting how to deal with all the other config options (time, warmup, precheck etc.). At best the mechanism would be fairly generic so that not every time "benchee core" adds an option the DSL would need to be adapted.
Specifying formatters might also be very interesting, keep in mind that we're changing that API soon (bring arguments closer to the formatter themselves)
also how to incorporate hooks that are specific for just one job might be interesting or left out

Otherwise looks good to me on a first view :)

I added DSL to the title, as that's always how I thought about this as Benchee.DSL. Of course as you do it it's your choice just my input :grin:

On another not - I think it's interesting how you add the anonymous function in the setup_all - it avoids duplication but at first confused the hell out of me :smile: (mainly because I usually put it as a variable at the top and then reuse it in the job definitions)

Hope I was helpful :wave:

@hauleth What about the DSL do you prefer to the current API? Is there some functionality that you think the DSL could offer that the current API does not? Or is there something confusing about the current API that is made clearer with this DSL?

I'm totally open to the idea, but I would love some further input as to what users would gain by having a second API available to them. It would likely be a significant maintenance burden, so I think we should go down this road with care.

@devonestes that's a good question! Imo lots of people just prefer the DSL style because it looks nicer and looks more like ExUnit as far as I can see :)

Regarding maintenance one way or another I'd do it as a separate project - with no API completeness guarantees. However, if the "general" options (time, warmup etc.) are implemented in a generic way it could catch quite some stuff. Even if there's a fallback like options or configuration that just takes a key word list or a map. Only if we introduce new per job functionality would it really need adjustment... or so optimistic me hopes.

I can see a beauty of a DSL. But our working with data structures I like - I have one benchmark where I programmaticly build the map. Quite fun.

@devonestes I was thinking about making it separate library though (named Benchamin), however I wanted to discuss the DSL and all pros and cons there as it is IMHO the best place for such.

Why DSL? As @PragTob said, it looks more like ExUnit so it makes it more familiar for the development and allows to use it in similar way the ExUnit is used. So we could have directory containing performance tests and run them in CI for sake of comparing them with the future changes. Check out Rails PerfTest for what I have in mind.

@PragTob this could be written using my proposed syntax via meta programming. However for simple benches like these it would be infeasible to use DSL, DSL is meant for running benches for whole application, save them and track performance changes through time. So use case is a little bit different.

@hauleth I don't understand why one needs a DSL to do these kind of benchmark comparisons. It does the same - it's just a different way to define them. Or am I missing something?

If you're looking for a project to do what you described look no further than elixir-bench which has the same purpose (aka run benchmarks on a CI like infrastructure, save results, compare over time). It still needs some work but we're currently working on it in GSoC (cc: @tallysmartins). Currently it uses the JSON formatter to put the JSON in a specified directory but we'll probably write a custom formatter for it.

And of course it can be written using meta programming, that's just more complicated and not as easy as building up a Map :)

Don't get me wrong, I'm not against a DSL I just want to understand it better. I can totally see how the biggest benefit might just be "It looks more like ExUnit so it's not a thing I run once but all the time and I maintain them more dilligently".

Hello,

I have picked up the idea for a benchee_dsl after reading the discussion in #312 and afterward the discussion here. And yes, the benefit just writing a benchmark in another form isn't so big. But I have taken it for a little exercise in writing a DSL and some macros. I have made it as simple as possible. The DSL collects just all the data that is needed to call Benchee.run in the end. As long as the next version of Benchee is backward compatible no change in BencheeDsl is needed. If you have time, you can have a look at it: https://github.com/hrzndhrn/benchee_dsl Some more examples are at test/fixtures and in the benchee_dsl branch in my TimeZoneInfo project here.

If anyone's curious, I used this for my random-benchmarker project:

Add this to mix.exs:

  defp aliases do
    [
      bench: &bench/1
    ]
  end

  defp bench(arglist) do
    Mix.Tasks.Run.run(["bench/bench.exs" | arglist])
  end

Then in bench/bench.exs:

System.argv()
|> Enum.each(fn bench ->
  try do
    Code.require_file("bench/#{bench}_bench.exs")
    mod = String.to_atom("Elixir.#{Macro.camelize(bench)}Bench")

    if :erlang.function_exported(mod, :module_info, 0) do
      if(:erlang.function_exported(mod, :classifiers, 0), do: mod.classifiers(), else: [nil])
      |> case do
        nil -> [nil]
        [] -> [nil]
        clas when is_list(clas) -> clas
      end
      |> Enum.each(fn cla ->
        if cla do
          title = "Benchmarking Classifier: #{cla}"
          sep = String.duplicate("=", String.length(title))
          IO.puts("\n#{title}\n#{sep}\n")
        end

        setup = if(:erlang.function_exported(mod, :setup, 1), do: mod.setup(cla), else: nil)

        m =
          cond do
            :erlang.function_exported(mod, :time, 2) -> mod.time(cla, setup)
            :erlang.function_exported(mod, :time, 1) -> mod.time(cla)
            true -> 5
          end

        inputs =
          cond do
            :erlang.function_exported(mod, :inputs, 2) -> mod.inputs(cla, setup)
            :erlang.function_exported(mod, :inputs, 1) -> mod.inputs(cla)
            true -> nil
          end

        actions =
          cond do
            :erlang.function_exported(mod, :actions, 2) -> mod.actions(cla, setup)
            true -> mod.actions(cla)
          end

        Benchee.run(
          actions,
          [
            time: m,
            warmup: m,
            memory_time: m,
            print: %{fast_warning: false}
          ] ++ if(inputs, do: [inputs: inputs], else: [])
        )

        if(:erlang.function_exported(mod, :teardown, 2), do: mod.teardown(cla, setup))
      end)
    end
  rescue
    r -> IO.puts("Unknown exception: #{Exception.format(:error, r, __STACKTRACE__)}")
  catch
    {type, reason} when type in [:throw, :exit] -> IO.puts("Unknown error: #{Exception.format(type, reason, __STACKTRACE__)}")
    e -> IO.puts("Unknown error: #{Exception.format(:throw, e, __STACKTRACE__)}")
  end
end)

And this means that you can run benchmarks by creating a file named bench/<benchmark_name>_bench.exs. Inside that file is just a simple module definition of the same name camelized. So for example if you had a benchmark named bench/struct_record_bench.exs then you could run mix bench struct_record (you can specify multiple benchmarks to run right after each other as well).

Inside the benchmark file, you'd have something like this, this is my bench/struct_record_bench.exs file as an actual example:

defmodule AStruct1 do
  defstruct [a: 1]
  def news1(), do: %__MODULE__{}
end

defmodule AStruct9 do
  defstruct [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]
  def news9(), do: %__MODULE__{}
end

defmodule ARecords do
  import Record
  defrecord :aRecord1, [a: 1]
  defrecord :aRecord9, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]
  def newr1(), do: aRecord1()
  def newr9(), do: aRecord9()
end

defmodule BRecord do
  defmacro __using__(fields) do
    # Add more helpers and flesh out functions and checks if this should ever be actually 'used'
    mappings = fields|>Enum.map(&elem(&1, 0))|>Enum.with_index(1)
    ast_new = [quote do def new() do {__MODULE__, unquote_splicing(Enum.map(fields, &elem(&1, 1)))} end end]
    ast_fields = [quote do def fields() do unquote(fields) end end]
    ast_field = Enum.map(mappings, fn {k, i} ->
      quote do defmacro field(unquote(k)) do unquote(i) end end
    end) ++ [quote do defmacro field(k) do quote do unquote(__MODULE__).field_idx(unquote(k)) end end end]
    ast_field_idx = Enum.map(mappings, fn {k, i} -> quote do def field_idx(unquote(k)), do: unquote(i) end end)
    ast_get = [quote do defmacro get(r, k) do quote do elem(unquote(r), unquote(__MODULE__).field(unquote(k))) end end end]
    ast_put = [quote do defmacro put(r, k, v) do quote do put_elem(unquote(r), unquote(__MODULE__).field(unquote(k)), unquote(v)) end end end]
    {:__block__, [], ast_new++ast_fields++ast_field++ast_field_idx++ast_get++ast_put}
  end

  defmacro get(r, k) do
    quote do
      r = unquote(r)
      elem(r, elem(r, 0).field_idx(unquote(k)))
    end
  end

  defmacro put(r, k, v) do
    quote do
      r = unquote(r)
      put_elem(r, elem(r, 0).field_idx(unquote(k)), unquote(v))
    end
  end
end

defmodule ARecord1 do
  use BRecord, [a: 1]
end

defmodule ARecord9 do
  use BRecord, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]
end

defmodule StructRecordBench do
  import AStruct1
  import AStruct9
  import ARecords
  require BRecord
  require ARecord1
  require ARecord9

  def classifiers(), do: [:get, :put]

  def time(_), do: 2

  def inputs(_) do
    nil
  end

  def actions(:get) do
    %{
      "Struct1" => fn -> news1().a end,
      "Struct9-first" => fn -> news9().a end,
      "Struct9-last" => fn -> news9().i end,
      "Record1-stock" => fn -> aRecord1(newr1(), :a) end,
      "Record1-remote" => fn -> ARecord1.new() |> BRecord.get(:a) end,
      "Record1-direct" => fn -> ARecord1.new() |> ARecord1.get(:a) end,
      "Record9-first-stock" => fn -> aRecord9(newr9(), :a) end,
      "Record9-first-remote" => fn -> ARecord9.new() |> BRecord.get(:a) end,
      "Record9-first-direct" => fn -> ARecord9.new() |> ARecord9.get(:a) end,
      "Record9-last-stock" => fn -> aRecord9(newr9(), :i) end,
      "Record9-last-remote" => fn -> ARecord9.new() |> BRecord.get(:i) end,
      "Record9-last-direct" => fn -> ARecord9.new() |> ARecord9.get(:i) end,
    }
  end

  def actions(:put) do
    %{
      "Struct1" => fn -> %{news1() | a: 42} end,
      "Struct1-opt" => fn -> %AStruct1{news1() | a: 42} end,
      "Struct9-first" => fn -> %{news9() | a: 42} end,
      "Struct9-first-opt" => fn -> %AStruct9{news9() | a: 42} end,
      "Struct9-last" => fn -> %{news9() | i: 42} end,
      "Struct9-last-opt" => fn -> %AStruct9{news9() | i: 42} end,
      "Record1-stock" => fn -> aRecord1(newr1(), a: 42) end,
      "Record1-remote" => fn -> ARecord1.new() |> BRecord.put(:a, 42) end,
      "Record1-direct" => fn -> ARecord1.new() |> ARecord1.put(:a, 42) end,
      "Record9-first-stock" => fn -> aRecord9(newr9(), a: 42) end,
      "Record9-first-remote" => fn -> ARecord9.new() |> BRecord.put(:a, 42) end,
      "Record9-first-direct" => fn -> ARecord9.new() |> ARecord9.put(:a, 42) end,
      "Record9-last-stock" => fn -> aRecord9(newr9(), i: 42) end,
      "Record9-last-remote" => fn -> ARecord9.new() |> BRecord.put(:i, 42) end,
      "Record9-last-direct" => fn -> ARecord9.new() |> ARecord9.put(:i, 42) end,
    }
  end
end

The defmodule StructRecordBench do is the only module that mattes in this, the rest is just stuff I test. This is a fairly simple example, not using a lot of the extra features, shows it shows a good basic example. But in short the benchmark module can have these callbacks on it, required or optional as defined:

classifiers/0: Optional defaults to [nil]. Must be a list of anything that can be to_string/1'd (I should probably relax that, but haven't had a need yet...). Each of this will run Benchee each for one of these so these are distinct runs. Each one is looped over for a benchmark run and passed as the first argument for the remaining callbacks.
setup/1: Optional defaults to nil. This is a setup function to do one-time setup for all benchmark runs, things like making a database connection or whatever.
time/1,2: Optional defaults to 5 seconds. This is just the Benchee running time for each benchmark. Sometimes I lengthen it for some, sometimes shorten it, whichever feels better and doesn't seem to effect the results. If it takes one argument it's the classifier, if two arguments it's the classifier then setup result. I should probably make a 0-arity for all these too but eh...
inputs/1,2: Optional defaults to nil. This is just passed to Benchee as normal. If it takes one argument it's the classifier, if two arguments it's the classifier then setup result. I should probably make a 0-arity for all these too but eh...
actions/1,2: Required. This is the actions structure passed straight to Benchee as well. If it takes one argument it's the classifier, if two arguments it's the classifier then setup result. I should probably make a 0-arity for all these too but eh...
teardown/2: Optional. This is passed the classifier and setup result to perform whatever teardown is needed.

This isn't really necessarily shorter then just using Benchee straight, but it feels nice to use, no DSEL needed as it's just simple callbacks, and it grew very organically to what it is now, could obviously use more features, but what it has now is what I've needed to date.

If curious, the result of running the above struct_record benchmark for me right now is:

Benchmarking Classifier: get
============================

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.10.0
Erlang 22.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 1.20 min

Benchmarking Record1-direct...
Benchmarking Record1-remote...
Benchmarking Record1-stock...
Benchmarking Record9-first-direct...
Benchmarking Record9-first-remote...
Benchmarking Record9-first-stock...
Benchmarking Record9-last-direct...
Benchmarking Record9-last-remote...
Benchmarking Record9-last-stock...
Benchmarking Struct1...
Benchmarking Struct9-first...
Benchmarking Struct9-last...

Name                           ips        average  deviation         median         99th %
Record9-first-stock        27.01 M      0.0370 μs   ±817.08%      0.0300 μs      0.0800 μs
Record9-last-direct        25.71 M      0.0389 μs     ±5.88%      0.0380 μs      0.0460 μs
Record9-last-stock         25.66 M      0.0390 μs     ±5.44%      0.0380 μs      0.0460 μs
Record1-stock              25.14 M      0.0398 μs     ±9.64%      0.0380 μs      0.0510 μs
Record9-first-direct       24.74 M      0.0404 μs     ±8.54%      0.0380 μs      0.0500 μs
Record1-direct             24.70 M      0.0405 μs     ±9.65%      0.0380 μs      0.0570 μs
Struct9-last               21.83 M      0.0458 μs   ±582.69%      0.0400 μs      0.0900 μs
Struct9-first              21.81 M      0.0459 μs     ±5.12%      0.0450 μs      0.0540 μs
Struct1                    21.71 M      0.0461 μs     ±7.85%      0.0450 μs      0.0560 μs
Record1-remote             12.38 M      0.0808 μs     ±6.85%      0.0770 μs      0.0970 μs
Record9-last-remote        12.02 M      0.0832 μs     ±6.97%      0.0800 μs       0.103 μs
Record9-first-remote       11.88 M      0.0842 μs    ±31.25%      0.0800 μs       0.120 μs

Comparison: 
Record9-first-stock        27.01 M
Record9-last-direct        25.71 M - 1.05x slower
Record9-last-stock         25.66 M - 1.05x slower
Record1-stock              25.14 M - 1.07x slower
Record9-first-direct       24.74 M - 1.09x slower
Record1-direct             24.70 M - 1.09x slower
Struct9-last               21.83 M - 1.24x slower
Struct9-first              21.81 M - 1.24x slower
Struct1                    21.71 M - 1.24x slower
Record1-remote             12.38 M - 2.18x slower
Record9-last-remote        12.02 M - 2.25x slower
Record9-first-remote       11.88 M - 2.27x slower

Memory usage statistics:

Name                    Memory usage
Record9-first-stock             24 B
Record9-last-direct             24 B - 1.00x memory usage
Record9-last-stock              24 B - 1.00x memory usage
Record1-stock                   24 B - 1.00x memory usage
Record9-first-direct            24 B - 1.00x memory usage
Record1-direct                  24 B - 1.00x memory usage
Struct9-last                    24 B - 1.00x memory usage
Struct9-first                   24 B - 1.00x memory usage
Struct1                         24 B - 1.00x memory usage
Record1-remote                  24 B - 1.00x memory usage
Record9-last-remote             24 B - 1.00x memory usage
Record9-first-remote            24 B - 1.00x memory usage

**All measurements for memory usage were the same**

Benchmarking Classifier: put
============================

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.10.0
Erlang 22.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 1.50 min

Benchmarking Record1-direct...
Benchmarking Record1-remote...
Benchmarking Record1-stock...
Benchmarking Record9-first-direct...
Benchmarking Record9-first-remote...
Benchmarking Record9-first-stock...
Benchmarking Record9-last-direct...
Benchmarking Record9-last-remote...
Benchmarking Record9-last-stock...
Benchmarking Struct1...
Benchmarking Struct1-opt...
Benchmarking Struct9-first...
Benchmarking Struct9-first-opt...
Benchmarking Struct9-last...
Benchmarking Struct9-last-opt...

Name                           ips        average  deviation         median         99th %
Record1-stock              17.77 M      0.0563 μs   ±625.17%      0.0500 μs       0.110 μs
Record1-direct             17.58 M      0.0569 μs   ±576.91%      0.0500 μs       0.120 μs
Struct1                    16.96 M      0.0590 μs   ±642.00%      0.0500 μs       0.130 μs
Record9-first-direct       16.63 M      0.0601 μs   ±538.28%      0.0500 μs       0.120 μs
Record9-last-direct        16.08 M      0.0622 μs   ±574.39%      0.0600 μs       0.120 μs
Record9-last-stock         15.84 M      0.0631 μs   ±445.86%      0.0500 μs       0.130 μs
Record9-first-stock        15.19 M      0.0658 μs   ±210.58%      0.0600 μs       0.130 μs
Struct1-opt                14.45 M      0.0692 μs   ±293.24%      0.0600 μs       0.150 μs
Struct9-first              13.78 M      0.0726 μs   ±367.36%      0.0700 μs       0.146 μs
Struct9-first-opt          12.19 M      0.0821 μs   ±300.87%      0.0700 μs       0.160 μs
Struct9-last-opt           11.66 M      0.0858 μs   ±178.26%      0.0800 μs       0.150 μs
Struct9-last               10.17 M      0.0983 μs   ±484.90%      0.0900 μs       0.190 μs
Record1-remote             10.04 M      0.0996 μs   ±310.60%      0.0900 μs       0.150 μs
Record9-first-remote        9.07 M       0.110 μs   ±210.27%       0.100 μs       0.190 μs
Record9-last-remote         8.74 M       0.114 μs   ±176.43%       0.110 μs        0.27 μs

Comparison: 
Record1-stock              17.77 M
Record1-direct             17.58 M - 1.01x slower
Struct1                    16.96 M - 1.05x slower
Record9-first-direct       16.63 M - 1.07x slower
Record9-last-direct        16.08 M - 1.10x slower
Record9-last-stock         15.84 M - 1.12x slower
Record9-first-stock        15.19 M - 1.17x slower
Struct1-opt                14.45 M - 1.23x slower
Struct9-first              13.78 M - 1.29x slower
Struct9-first-opt          12.19 M - 1.46x slower
Struct9-last-opt           11.66 M - 1.52x slower
Struct9-last               10.17 M - 1.75x slower
Record1-remote             10.04 M - 1.77x slower
Record9-first-remote        9.07 M - 1.96x slower
Record9-last-remote         8.74 M - 2.03x slower

Memory usage statistics:

Name                    Memory usage
Record1-stock                   48 B
Record1-direct                  48 B - 1.00x memory usage
Struct1                         64 B - 1.33x memory usage
Record9-first-direct           112 B - 2.33x memory usage
Record9-last-direct            112 B - 2.33x memory usage
Record9-last-stock             112 B - 2.33x memory usage
Record9-first-stock            112 B - 2.33x memory usage
Struct1-opt                     64 B - 1.33x memory usage
Struct9-first                  128 B - 2.67x memory usage
Struct9-first-opt              128 B - 2.67x memory usage
Struct9-last-opt               128 B - 2.67x memory usage
Struct9-last                   128 B - 2.67x memory usage
Record1-remote                  48 B - 1.00x memory usage
Record9-first-remote           112 B - 2.33x memory usage
Record9-last-remote            112 B - 2.33x memory usage

**All measurements for memory usage were the same**

Yes I'm using a slightly older Benchee, this is a very old benchmark project, I need to update, lol.

@NickNeck sorry, busy times :(

Cool stuff and thank you! Looks pretty neat. I'm happy to give it a shout out in the README so people know it's there. If you want to we could also move it to the benchee org (and give you membership obviously) :) It's been often requested and I'd be happy to co-maintain it and give it something official. If you wanna keep it within your org I also completely understand :)

Cheers, thanks and have bunnies! Tobi

WhatsApp Image 2020-07-11 at 23 16 53

@PragTob it would be great to move benchee_dsl to benchee.org and get your help maintaining it. Especially on the issue with not consolidated protocols.

Best regards, Marcus

bencheeorg / benchee

Bench modules (ExUnit-style benchmarks definitions) (DSL) #236