Bench modules (ExUnit-style benchmarks definitions) (DSL) #236

Open hauleth opened 6 years ago

hauleth commented 6 years ago

I am thinking about writing support for ExUnit-style benchmarks (if there is no work toward such thing yet). And I am having something like that in mind:

defmodule SampleBench do
  use Benchee.Unit

  bench do
    input "Small (1 Thousand)", do: Enum.to_list(1..1_000)
    input "Middle (100 Thousand)", do: Enum.to_list(1..100_000)

    setup_all input do
      {Enum.shuffle(input), fn i -> [i, i + 1] end}

    func "flat_map", {input, func} do
      Enum.flat_map(input, func)

    func "map.flatten", {input, func} do
      input |> |> List.flatten

Any suggestions or other propositions for the syntax?

PragTob commented 6 years ago


Hi there and thanks for checking in :) Sometimes people ask about this, so it's also somewhere on my "fun" agenda for after 1.0. Hence I've given this some thought in the past :)

Otherwise looks good to me on a first view :)

I added DSL to the title, as that's always how I thought about this as Benchee.DSL. Of course as you do it it's your choice just my input :grin:

On another not - I think it's interesting how you add the anonymous function in the setup_all - it avoids duplication but at first confused the hell out of me :smile: (mainly because I usually put it as a variable at the top and then reuse it in the job definitions)

Hope I was helpful :wave:

devonestes commented 6 years ago

@hauleth What about the DSL do you prefer to the current API? Is there some functionality that you think the DSL could offer that the current API does not? Or is there something confusing about the current API that is made clearer with this DSL?

I'm totally open to the idea, but I would love some further input as to what users would gain by having a second API available to them. It would likely be a significant maintenance burden, so I think we should go down this road with care.

PragTob commented 6 years ago

@devonestes that's a good question! Imo lots of people just prefer the DSL style because it looks nicer and looks more like ExUnit as far as I can see :)

Regarding maintenance one way or another I'd do it as a separate project - with no API completeness guarantees. However, if the "general" options (time, warmup etc.) are implemented in a generic way it could catch quite some stuff. Even if there's a fallback like options or configuration that just takes a key word list or a map. Only if we introduce new per job functionality would it really need adjustment... or so optimistic me hopes.

I can see a beauty of a DSL. But our working with data structures I like - I have one benchmark where I programmaticly build the map. Quite fun.

hauleth commented 6 years ago

@devonestes I was thinking about making it separate library though (named Benchamin), however I wanted to discuss the DSL and all pros and cons there as it is IMHO the best place for such.

Why DSL? As @PragTob said, it looks more like ExUnit so it makes it more familiar for the development and allows to use it in similar way the ExUnit is used. So we could have directory containing performance tests and run them in CI for sake of comparing them with the future changes. Check out Rails PerfTest for what I have in mind.

@PragTob this could be written using my proposed syntax via meta programming. However for simple benches like these it would be infeasible to use DSL, DSL is meant for running benches for whole application, save them and track performance changes through time. So use case is a little bit different.

PragTob commented 6 years ago

@hauleth I don't understand why one needs a DSL to do these kind of benchmark comparisons. It does the same - it's just a different way to define them. Or am I missing something?

If you're looking for a project to do what you described look no further than elixir-bench which has the same purpose (aka run benchmarks on a CI like infrastructure, save results, compare over time). It still needs some work but we're currently working on it in GSoC (cc: @tallysmartins). Currently it uses the JSON formatter to put the JSON in a specified directory but we'll probably write a custom formatter for it.

And of course it can be written using meta programming, that's just more complicated and not as easy as building up a Map :)

Don't get me wrong, I'm not against a DSL I just want to understand it better. I can totally see how the biggest benefit might just be "It looks more like ExUnit so it's not a thing I run once but all the time and I maintain them more dilligently".

NickNeck commented 4 years ago


I have picked up the idea for a benchee_dsl after reading the discussion in #312 and afterward the discussion here. And yes, the benefit just writing a benchmark in another form isn't so big. But I have taken it for a little exercise in writing a DSL and some macros. I have made it as simple as possible. The DSL collects just all the data that is needed to call in the end. As long as the next version of Benchee is backward compatible no change in BencheeDsl is needed. If you have time, you can have a look at it: Some more examples are at test/fixtures and in the benchee_dsl branch in my TimeZoneInfo project here.

OvermindDL1 commented 4 years ago

If anyone's curious, I used this for my random-benchmarker project:

Add this to mix.exs:

  defp aliases do
      bench: &bench/1

  defp bench(arglist) do["bench/bench.exs" | arglist])

Then in bench/bench.exs:

|> Enum.each(fn bench ->
  try do
    mod = String.to_atom("Elixir.#{Macro.camelize(bench)}Bench")

    if :erlang.function_exported(mod, :module_info, 0) do
      if(:erlang.function_exported(mod, :classifiers, 0), do: mod.classifiers(), else: [nil])
      |> case do
        nil -> [nil]
        [] -> [nil]
        clas when is_list(clas) -> clas
      |> Enum.each(fn cla ->
        if cla do
          title = "Benchmarking Classifier: #{cla}"
          sep = String.duplicate("=", String.length(title))

        setup = if(:erlang.function_exported(mod, :setup, 1), do: mod.setup(cla), else: nil)

        m =
          cond do
            :erlang.function_exported(mod, :time, 2) -> mod.time(cla, setup)
            :erlang.function_exported(mod, :time, 1) -> mod.time(cla)
            true -> 5

        inputs =
          cond do
            :erlang.function_exported(mod, :inputs, 2) -> mod.inputs(cla, setup)
            :erlang.function_exported(mod, :inputs, 1) -> mod.inputs(cla)
            true -> nil

        actions =
          cond do
            :erlang.function_exported(mod, :actions, 2) -> mod.actions(cla, setup)
            true -> mod.actions(cla)
            time: m,
            warmup: m,
            memory_time: m,
            print: %{fast_warning: false}
          ] ++ if(inputs, do: [inputs: inputs], else: [])

        if(:erlang.function_exported(mod, :teardown, 2), do: mod.teardown(cla, setup))
    r -> IO.puts("Unknown exception: #{Exception.format(:error, r, __STACKTRACE__)}")
    {type, reason} when type in [:throw, :exit] -> IO.puts("Unknown error: #{Exception.format(type, reason, __STACKTRACE__)}")
    e -> IO.puts("Unknown error: #{Exception.format(:throw, e, __STACKTRACE__)}")

And this means that you can run benchmarks by creating a file named bench/<benchmark_name>_bench.exs. Inside that file is just a simple module definition of the same name camelized. So for example if you had a benchmark named bench/struct_record_bench.exs then you could run mix bench struct_record (you can specify multiple benchmarks to run right after each other as well).

Inside the benchmark file, you'd have something like this, this is my bench/struct_record_bench.exs file as an actual example:

defmodule AStruct1 do
  defstruct [a: 1]
  def news1(), do: %__MODULE__{}

defmodule AStruct9 do
  defstruct [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]
  def news9(), do: %__MODULE__{}

defmodule ARecords do
  import Record
  defrecord :aRecord1, [a: 1]
  defrecord :aRecord9, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]
  def newr1(), do: aRecord1()
  def newr9(), do: aRecord9()

defmodule BRecord do
  defmacro __using__(fields) do
    # Add more helpers and flesh out functions and checks if this should ever be actually 'used'
    mappings = fields|>, 0))|>Enum.with_index(1)
    ast_new = [quote do def new() do {__MODULE__, unquote_splicing(, &elem(&1, 1)))} end end]
    ast_fields = [quote do def fields() do unquote(fields) end end]
    ast_field =, fn {k, i} ->
      quote do defmacro field(unquote(k)) do unquote(i) end end
    end) ++ [quote do defmacro field(k) do quote do unquote(__MODULE__).field_idx(unquote(k)) end end end]
    ast_field_idx =, fn {k, i} -> quote do def field_idx(unquote(k)), do: unquote(i) end end)
    ast_get = [quote do defmacro get(r, k) do quote do elem(unquote(r), unquote(__MODULE__).field(unquote(k))) end end end]
    ast_put = [quote do defmacro put(r, k, v) do quote do put_elem(unquote(r), unquote(__MODULE__).field(unquote(k)), unquote(v)) end end end]
    {:__block__, [], ast_new++ast_fields++ast_field++ast_field_idx++ast_get++ast_put}

  defmacro get(r, k) do
    quote do
      r = unquote(r)
      elem(r, elem(r, 0).field_idx(unquote(k)))

  defmacro put(r, k, v) do
    quote do
      r = unquote(r)
      put_elem(r, elem(r, 0).field_idx(unquote(k)), unquote(v))

defmodule ARecord1 do
  use BRecord, [a: 1]

defmodule ARecord9 do
  use BRecord, [a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9]

defmodule StructRecordBench do
  import AStruct1
  import AStruct9
  import ARecords
  require BRecord
  require ARecord1
  require ARecord9

  def classifiers(), do: [:get, :put]

  def time(_), do: 2

  def inputs(_) do

  def actions(:get) do
      "Struct1" => fn -> news1().a end,
      "Struct9-first" => fn -> news9().a end,
      "Struct9-last" => fn -> news9().i end,
      "Record1-stock" => fn -> aRecord1(newr1(), :a) end,
      "Record1-remote" => fn -> |> BRecord.get(:a) end,
      "Record1-direct" => fn -> |> ARecord1.get(:a) end,
      "Record9-first-stock" => fn -> aRecord9(newr9(), :a) end,
      "Record9-first-remote" => fn -> |> BRecord.get(:a) end,
      "Record9-first-direct" => fn -> |> ARecord9.get(:a) end,
      "Record9-last-stock" => fn -> aRecord9(newr9(), :i) end,
      "Record9-last-remote" => fn -> |> BRecord.get(:i) end,
      "Record9-last-direct" => fn -> |> ARecord9.get(:i) end,

  def actions(:put) do
      "Struct1" => fn -> %{news1() | a: 42} end,
      "Struct1-opt" => fn -> %AStruct1{news1() | a: 42} end,
      "Struct9-first" => fn -> %{news9() | a: 42} end,
      "Struct9-first-opt" => fn -> %AStruct9{news9() | a: 42} end,
      "Struct9-last" => fn -> %{news9() | i: 42} end,
      "Struct9-last-opt" => fn -> %AStruct9{news9() | i: 42} end,
      "Record1-stock" => fn -> aRecord1(newr1(), a: 42) end,
      "Record1-remote" => fn -> |> BRecord.put(:a, 42) end,
      "Record1-direct" => fn -> |> ARecord1.put(:a, 42) end,
      "Record9-first-stock" => fn -> aRecord9(newr9(), a: 42) end,
      "Record9-first-remote" => fn -> |> BRecord.put(:a, 42) end,
      "Record9-first-direct" => fn -> |> ARecord9.put(:a, 42) end,
      "Record9-last-stock" => fn -> aRecord9(newr9(), i: 42) end,
      "Record9-last-remote" => fn -> |> BRecord.put(:i, 42) end,
      "Record9-last-direct" => fn -> |> ARecord9.put(:i, 42) end,

The defmodule StructRecordBench do is the only module that mattes in this, the rest is just stuff I test. This is a fairly simple example, not using a lot of the extra features, shows it shows a good basic example. But in short the benchmark module can have these callbacks on it, required or optional as defined:

This isn't really necessarily shorter then just using Benchee straight, but it feels nice to use, no DSEL needed as it's just simple callbacks, and it grew very organically to what it is now, could obviously use more features, but what it has now is what I've needed to date.

OvermindDL1 commented 4 years ago

If curious, the result of running the above struct_record benchmark for me right now is:

Benchmarking Classifier: get

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.10.0
Erlang 22.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 1.20 min

Benchmarking Record1-direct...
Benchmarking Record1-remote...
Benchmarking Record1-stock...
Benchmarking Record9-first-direct...
Benchmarking Record9-first-remote...
Benchmarking Record9-first-stock...
Benchmarking Record9-last-direct...
Benchmarking Record9-last-remote...
Benchmarking Record9-last-stock...
Benchmarking Struct1...
Benchmarking Struct9-first...
Benchmarking Struct9-last...

Name                           ips        average  deviation         median         99th %
Record9-first-stock        27.01 M      0.0370 μs   ±817.08%      0.0300 μs      0.0800 μs
Record9-last-direct        25.71 M      0.0389 μs     ±5.88%      0.0380 μs      0.0460 μs
Record9-last-stock         25.66 M      0.0390 μs     ±5.44%      0.0380 μs      0.0460 μs
Record1-stock              25.14 M      0.0398 μs     ±9.64%      0.0380 μs      0.0510 μs
Record9-first-direct       24.74 M      0.0404 μs     ±8.54%      0.0380 μs      0.0500 μs
Record1-direct             24.70 M      0.0405 μs     ±9.65%      0.0380 μs      0.0570 μs
Struct9-last               21.83 M      0.0458 μs   ±582.69%      0.0400 μs      0.0900 μs
Struct9-first              21.81 M      0.0459 μs     ±5.12%      0.0450 μs      0.0540 μs
Struct1                    21.71 M      0.0461 μs     ±7.85%      0.0450 μs      0.0560 μs
Record1-remote             12.38 M      0.0808 μs     ±6.85%      0.0770 μs      0.0970 μs
Record9-last-remote        12.02 M      0.0832 μs     ±6.97%      0.0800 μs       0.103 μs
Record9-first-remote       11.88 M      0.0842 μs    ±31.25%      0.0800 μs       0.120 μs

Record9-first-stock        27.01 M
Record9-last-direct        25.71 M - 1.05x slower
Record9-last-stock         25.66 M - 1.05x slower
Record1-stock              25.14 M - 1.07x slower
Record9-first-direct       24.74 M - 1.09x slower
Record1-direct             24.70 M - 1.09x slower
Struct9-last               21.83 M - 1.24x slower
Struct9-first              21.81 M - 1.24x slower
Struct1                    21.71 M - 1.24x slower
Record1-remote             12.38 M - 2.18x slower
Record9-last-remote        12.02 M - 2.25x slower
Record9-first-remote       11.88 M - 2.27x slower

Memory usage statistics:

Name                    Memory usage
Record9-first-stock             24 B
Record9-last-direct             24 B - 1.00x memory usage
Record9-last-stock              24 B - 1.00x memory usage
Record1-stock                   24 B - 1.00x memory usage
Record9-first-direct            24 B - 1.00x memory usage
Record1-direct                  24 B - 1.00x memory usage
Struct9-last                    24 B - 1.00x memory usage
Struct9-first                   24 B - 1.00x memory usage
Struct1                         24 B - 1.00x memory usage
Record1-remote                  24 B - 1.00x memory usage
Record9-last-remote             24 B - 1.00x memory usage
Record9-first-remote            24 B - 1.00x memory usage

**All measurements for memory usage were the same**

Benchmarking Classifier: put

Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.10.0
Erlang 22.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 1.50 min

Benchmarking Record1-direct...
Benchmarking Record1-remote...
Benchmarking Record1-stock...
Benchmarking Record9-first-direct...
Benchmarking Record9-first-remote...
Benchmarking Record9-first-stock...
Benchmarking Record9-last-direct...
Benchmarking Record9-last-remote...
Benchmarking Record9-last-stock...
Benchmarking Struct1...
Benchmarking Struct1-opt...
Benchmarking Struct9-first...
Benchmarking Struct9-first-opt...
Benchmarking Struct9-last...
Benchmarking Struct9-last-opt...

Name                           ips        average  deviation         median         99th %
Record1-stock              17.77 M      0.0563 μs   ±625.17%      0.0500 μs       0.110 μs
Record1-direct             17.58 M      0.0569 μs   ±576.91%      0.0500 μs       0.120 μs
Struct1                    16.96 M      0.0590 μs   ±642.00%      0.0500 μs       0.130 μs
Record9-first-direct       16.63 M      0.0601 μs   ±538.28%      0.0500 μs       0.120 μs
Record9-last-direct        16.08 M      0.0622 μs   ±574.39%      0.0600 μs       0.120 μs
Record9-last-stock         15.84 M      0.0631 μs   ±445.86%      0.0500 μs       0.130 μs
Record9-first-stock        15.19 M      0.0658 μs   ±210.58%      0.0600 μs       0.130 μs
Struct1-opt                14.45 M      0.0692 μs   ±293.24%      0.0600 μs       0.150 μs
Struct9-first              13.78 M      0.0726 μs   ±367.36%      0.0700 μs       0.146 μs
Struct9-first-opt          12.19 M      0.0821 μs   ±300.87%      0.0700 μs       0.160 μs
Struct9-last-opt           11.66 M      0.0858 μs   ±178.26%      0.0800 μs       0.150 μs
Struct9-last               10.17 M      0.0983 μs   ±484.90%      0.0900 μs       0.190 μs
Record1-remote             10.04 M      0.0996 μs   ±310.60%      0.0900 μs       0.150 μs
Record9-first-remote        9.07 M       0.110 μs   ±210.27%       0.100 μs       0.190 μs
Record9-last-remote         8.74 M       0.114 μs   ±176.43%       0.110 μs        0.27 μs

Record1-stock              17.77 M
Record1-direct             17.58 M - 1.01x slower
Struct1                    16.96 M - 1.05x slower
Record9-first-direct       16.63 M - 1.07x slower
Record9-last-direct        16.08 M - 1.10x slower
Record9-last-stock         15.84 M - 1.12x slower
Record9-first-stock        15.19 M - 1.17x slower
Struct1-opt                14.45 M - 1.23x slower
Struct9-first              13.78 M - 1.29x slower
Struct9-first-opt          12.19 M - 1.46x slower
Struct9-last-opt           11.66 M - 1.52x slower
Struct9-last               10.17 M - 1.75x slower
Record1-remote             10.04 M - 1.77x slower
Record9-first-remote        9.07 M - 1.96x slower
Record9-last-remote         8.74 M - 2.03x slower

Memory usage statistics:

Name                    Memory usage
Record1-stock                   48 B
Record1-direct                  48 B - 1.00x memory usage
Struct1                         64 B - 1.33x memory usage
Record9-first-direct           112 B - 2.33x memory usage
Record9-last-direct            112 B - 2.33x memory usage
Record9-last-stock             112 B - 2.33x memory usage
Record9-first-stock            112 B - 2.33x memory usage
Struct1-opt                     64 B - 1.33x memory usage
Struct9-first                  128 B - 2.67x memory usage
Struct9-first-opt              128 B - 2.67x memory usage
Struct9-last-opt               128 B - 2.67x memory usage
Struct9-last                   128 B - 2.67x memory usage
Record1-remote                  48 B - 1.00x memory usage
Record9-first-remote           112 B - 2.33x memory usage
Record9-last-remote            112 B - 2.33x memory usage

**All measurements for memory usage were the same**

Yes I'm using a slightly older Benchee, this is a very old benchmark project, I need to update, lol.

PragTob commented 4 years ago

@NickNeck sorry, busy times :(

Cool stuff and thank you! Looks pretty neat. I'm happy to give it a shout out in the README so people know it's there. If you want to we could also move it to the benchee org (and give you membership obviously) :) It's been often requested and I'd be happy to co-maintain it and give it something official. If you wanna keep it within your org I also completely understand :)

Cheers, thanks and have bunnies! Tobi

NickNeck commented 4 years ago

@PragTob it would be great to move benchee_dsl to and get your help maintaining it. Especially on the issue with not consolidated protocols.

Best regards, Marcus